What is Search Engine Placement?

The so-called search engine is a retrieval technology based on user needs and certain algorithms, using specific strategies to retrieve information formulated from the Internet and feeding it back to users. The search engine relies on a variety of technologies, such as web crawler technology, search and ranking technology, webpage processing technology, big data processing technology, natural language processing technology, etc., to provide information retrieval users with fast and highly relevant information services. The core modules of search engine technology generally include crawling, indexing, searching, and sorting. At the same time, a series of other auxiliary modules can be added to create a better network environment for users. [1]

The so-called search engine is a retrieval technology based on user needs and certain algorithms, using specific strategies to retrieve information formulated from the Internet and feeding it back to users. The search engine relies on a variety of technologies, such as web crawler technology, search and ranking technology, webpage processing technology, big data processing technology, natural language processing technology, etc., to provide information retrieval users with fast and highly relevant information services. The core modules of search engine technology generally include crawling, indexing, searching, and sorting. At the same time, a series of other auxiliary modules can be added to create a better network environment for users. [1]
Chinese name
search engine
Foreign name
search engine
Classification
Full-text index, catalog index, etc.
On behalf of
Baidu, Google, etc.
Function
Access to information
Key technology
Web crawlers, big data processing, data mining, etc.

Search engine definition

A search engine refers to a system that collects information from the Internet using a specific computer program according to a certain strategy, organizes and processes the information, provides users with a retrieval service, and displays the retrieved related information to the user. A search engine is a search technology that works on the Internet. It aims to increase the speed at which people can gather information and provide people with a better network environment. In terms of functions and principles, search engines are roughly divided into four categories: full-text search engines, meta search engines, vertical search engines, and directory search engines. [2]
The search engine has developed to this day, and the basic architecture and algorithms have basically formed and matured technically. The search engine has developed into a system that collects information from the Internet according to certain strategies and uses specific computer programs. After organizing and processing the information, it provides users with retrieval services and displays the information related to users' retrieval to users. [3]

Search Engine Development

The search engine is produced and developed along with the development of the Internet. The Internet has become an indispensable platform for people to study, work and live. Almost everyone uses the search engine to go online. The search engine has undergone roughly four generations of development: [4]
1.First-generation search engines
In 1994, the first generation of the real Internet-based search engine Lycos was born. It mainly categorized directories manually. The representative manufacturer was Yahoo. It is characterized by manually categorizing various directories for storing websites. Users search for websites in various ways. This way exists. [4]
2.Second generation search engine
With the development of web application technology, users began to want to search for content. A second-generation search engine appeared, that is, using keywords to query. The most representative and most successful is Google, which is based on web link analysis technology. On the basis, using keywords to search the webpage can benefit a large number of webpage content on the Internet. This technology can analyze the importance of the webpage and present important results to the user. [4]
3. Third-generation search engine
With the rapid expansion of network information, users want to be able to quickly and accurately find the information they want, so the third-generation search engine has appeared. Compared with the previous two generations of the third generation search engine, it pays more attention to personalization and professional intelligence. It uses artificial intelligence technologies such as automatic clustering and classification. It uses regional intelligent recognition and content analysis technology and uses manual intervention to achieve the perfect combination of technology and labor To enhance the search capabilities of search engines. The third-generation search engine is represented by Google, which has created a new situation for the development of search engine technology with a wide information coverage and excellent search performance. [4]
4.Fourth generation search
With the rapid development of information diversification, it is not possible for general search engines to obtain more comprehensive information on the Internet under current hardware conditions. At this time, users need comprehensive, timely, and categorized topic-oriented search. Engine. This search engine uses strategies such as feature extraction and text intelligence. It is more accurate and effective than the previous three generations of search engines. It is called the fourth generation of search engines. [4]

How search engines work

The entire working process of a search engine is considered as three parts: one is that the spider crawls and crawls webpage information on the Internet and stores it in the original webpage database; the other is that the information in the original webpage database is extracted and organized, and an index database is established Third, based on the keywords entered by the user, quickly find relevant documents, sort the results found, and return the query results to the user. The following further analyzes its working principle: [5]
First, web scraping
Each time Spider encounters a new document, it searches for a link to its page. The process of search engine spiders accessing web pages is similar to that of ordinary users using browsers to access their pages, that is, B / S mode. The engine spider first makes an access request to the page, and the server accepts the access request and returns the HTML code, and then stores the obtained HTML code into the original page database. Search engines use multiple spiders to spread crawls to increase crawl speed. Search engines have servers all over the world, and each server sends multiple spiders to crawl web pages at the same time. How to make a page visit only once, so as to improve the work efficiency of the search engine. When crawling a web page, the search engine creates two different tables, one table records the websites that have been visited, and one table records the websites that have not been visited. When a spider crawls the URL of an external link page, it needs to download the URL of the website for analysis. After the spider has analyzed the URL, it stores the URL in the corresponding table. When the website or page found this URL again, it will compare and see if the visited list is available. If so, the spider will automatically discard the URL and no longer visit. [5]
Second, preprocessing and indexing
In order to facilitate users to quickly and easily find search results in the database of original web pages with levels of trillions or more, the search engine must preprocess the original web pages captured by the spider. The main process of web page preprocessing is to build a full-text index for the web page, then start analyzing the web page, and finally create an inverted file (also called reverse index). Web page analysis has the following steps: determine the type of web page, measure its importance, richness, analyze hyperlinks, segment words, and remove duplicate web pages. After analysis and processing by the search engine, the web page is no longer the original web page, but has been condensed into a document that reflects the theme of the page in words. The most complex structure in data indexing is the establishment of an index library, which is divided into a document index and a keyword index. The unique docID number of each web page is assigned by the document index. The number, position, and size of each wordID can be retrieved from the web page based on the docID number. Finally, a data list of wordID is formed. The inverted index formation process is this: the search engine uses the word segmentation system to automatically divide the document into a sequence of words-assign a unique word number to each word-record the document containing this word. An inverted index is the simplest, and a practical inverted index needs to record more information. In addition to the document number recorded in the inverted list corresponding to the word, the word frequency information is also recorded, which is convenient for calculating the similarity between the query and the document in the future. [5]
Third, the query service
After entering keywords in the search engine interface and clicking the "Search" button, the search engine program starts to perform the following processing on the search words: word segmentation processing, judging whether the integrated search needs to be started according to the situation, finding typos and errors in spelling, Remove stop words. Then the search engine program finds the relevant web pages containing the search terms from the index database, sorts the web pages, and finally returns to the "search" page in a certain format. The core part of the query service is the ranking of search results, which determines the quality of the search engine and user satisfaction. There are many factors in the ranking of actual search results, but one of the most important factors is the relevance of web page content. The main factors affecting the correlation include the following five aspects. [5]
(1) How often keywords are used. After the word segmentation, multiple keywords have different meaning contributions to the entire search string. The more commonly used words contribute less to the meaning of search terms, the less commonly used words contribute more to the meaning of search terms. Common words that develop to a certain limit are stop words that have no effect on the page. Therefore, the weighting coefficients of words used by search engines are high, and the weighting coefficients of common words are low. The ranking algorithm pays more attention to the words that are not commonly used. [5]
(2) Word frequency and density. In general, the density of search terms is positively related to the number of times they appear on the page. The more times, the higher the density, the closer the page is to the search terms. [5]
(3) Keyword position and form. Keywords appear in more important positions, such as title tags, bold, H1, etc., indicating that the more relevant the page is to the keywords. As mentioned in the establishment of the index database, the format and location of page keywords are recorded in the index database. [5]
(4) Keyword distance. After the keywords are segmented, if a match appears, it indicates that it is more relevant to the search term. When "search engine" appears continuously and completely on the page or when "search" and "engine" appear relatively close, It is considered relevant to the search term. [5]
(5) Link analysis and page weight. Links and weight relationships between pages also affect the relevance of keywords, the most important of which is anchor text. The more import links with search words as anchor text on a page, the more relevant the page is. The link analysis also includes the theme of the link source page itself, the text around the anchor text, and so on. [5]

Search engine classification

Search methods are a key link of search engines, and can be roughly divided into four types: full-text search engines, meta search engines, vertical search engines, and directory search engines. They have their own characteristics and are suitable for different search environments. Therefore, flexible selection of search methods is an important way to improve search engine performance. The full-text search engine is a search method that uses crawlers to crawl all relevant articles on the Internet for indexing; the meta search engine is a secondary search method based on the integration and processing of multiple search engine results; the vertical search engine is a specific industry A professional search method for fast retrieval of internal data; a directory search engine is a search method that relies on manual collection and processing of data and is placed under a category directory link. [1]

Search engine full text search engine

General web users are suitable for full text search engines. This search is convenient, simple, and easy to get all relevant information. However, the searched information is too complicated, so users need to browse and identify the required information one by one. This search method is very effective especially when the user does not have a clear retrieval intention. [1]

Search Engine Meta Search Engine

Meta search engines are suitable for collecting information broadly and accurately. Different full-text search engines have their advantages and disadvantages due to their differences in performance and information feedback capabilities. The emergence of the meta search engine has just solved this problem, which is conducive to the complementary advantages of the basic search engines. And this search method is conducive to the global control of the basic search method, and guide the continuous improvement of the full-text search engine. [1]

Search engine vertical search engine

The vertical search engine is suitable for searching with clear search intent. For example, when users purchase air tickets, train tickets, bus tickets, or want to browse online video resources, they can directly select a dedicated search engine in the industry to accurately and quickly obtain relevant information. [1]

Search engine directory search engine

A directory search engine is a commonly used search method within a website. This search method is designed to integrate and process the information on the website and present it to users in different directories, but its disadvantages are that users need to know the content of this website in advance and be familiar with its main module composition. In summary, the scope of the directory search method is very limited and requires high labor costs to support maintenance. [1]

Search Engine Key Features

1. Quickly capture information.
In the era of big data, the amount of information generated by the Internet is so vast that people are at a loss, and it is difficult to obtain the information resources they need. Search engine technology
search for
With the help of keywords, you can quickly capture highly relevant matching information using search methods such as keywords and advanced grammar. [1]
2. In-depth information mining.
The search engine, while capturing the information required by the user, can also analyze the retrieved information in a certain dimension to guide its use and understanding of the information. For example, the user can judge the popularity of the retrieval object based on the retrieved information items, can also give similar objects with high relevance according to the retrieved information distribution, can also use the retrieved information to intelligently give user solutions, etc. . [1]
3. Diversity and comprehensiveness of retrieval content.
With the increasing maturity of search engine technology, contemporary search engine technology can support almost all types of data retrieval, such as natural languages, intelligent languages, machine languages, and other languages. Currently, not only videos, audios, and images can be retrieved, but also human facial features, fingerprints, specific actions, and so on. It can be imagined that in the future, almost all data types may become the search target of search engines. [1]

Search engine architecture

The basic structure of a search engine generally includes four functional modules: searcher, indexer, retriever, and user interface. [5]
1. Searcher:
A searcher is also called a web spider. It is an automatic program used by search engines to crawl and crawl web pages. It continuously crawls at various nodes on the Internet in the background of the system and finds and crawls web pages as fast as possible during the crawling process. [5]
2. Indexer.
Its main function is to understand the web page information collected by the searcher and extract index entries from it. [5]
3. Retrieval.
Its function is to quickly find documents, evaluate the relevance of documents and queries, and sort the results to be output. [5]
4. User interface.
It provides users with a visual interface for query input and result output. [5]

Search Engine Function Module

The key functional modules in the search engine are briefly described below: [3]
(1) Crawler: crawl the original webpage data from the Internet and store it in the document knowledge base server. [3]
(2) Document knowledge base server: It stores the original webpage data, usually a distributed Key-Value database, and can quickly obtain webpage content based on URL / UID. [3]
(3) Index: Read the original web page data, parse the web page, extract valid fields, and generate index data. Index data is usually generated incrementally, in blocks / shards, and index merge, optimization, and deletion are performed. The generated index data usually includes: dictionary data, inverted tables, forward tables, document attributes, and so on. The generated index is stored in the index server. [3]
(4) Index server: stores index data, mainly inverted tables, which are usually divided into blocks and fragments, and supports incremental updates and deletions. When the amount of data content is very large, the data partition and distribution are also divided according to category, subject, time, and web page quality to better serve online queries. [3]
(5) Retrieval: Read the inverted table index, respond to the front-end query request, and return the related document list data. [3]
(6) Sorting: Sort the list of documents returned by the crawler, based on attributes such as the relevance of the document and the query, and the link weight of the document. [3]
(7) Link analysis: Collect the link data and anchor text of each web page, calculate the link score of each web page, and eventually participate in the ranking of the returned results as a web page attribute. [3]
(8) Web page deduplication: Extract the relevant feature attributes of each webpage, calculate similar webpage groups, and provide deduplication services for offline indexing and online query. [3]
(9) Anti-spam of webpages: collect historical information of each webpage and website, and extract the characteristics of spam webpages, so as to determine webpages in the online index and remove spamful webpages. [3]
(10) Query analysis: Analyze user queries, generate structured query requests, and assign them to the corresponding category and subject data servers for queries. [3]
(11) Page description / summary: Provide the corresponding description and summary for the searched and sorted webpage list. [3]
(12) Front-end: accept user requests, distribute to corresponding servers, and return query results. [3]

Search Engine Key Technologies

The search engine workflow mainly includes data acquisition, data preprocessing, data processing, and result display stages. Technologies such as web crawlers, Chinese word segmentation, big data processing, and data mining were used in each work stage. [2]
Web crawlers, also known as spiders or web robots, are an important part of search engine crawling systems. According to the corresponding rules, the web crawler traverses the entire Internet through hyperlinks on each page with certain sites as the starting site, and uses the URL bow to crawl information from one html document to another html document according to the breadth-first traversal strategy. . [2]
Chinese word segmentation is a key technology in Chinese search engines. Before creating an index, the Chinese content needs to be properly segmented. Chinese word segmentation is the basis of text mining. For a segment of Chinese input, successful Chinese word segmentation can achieve the effect that the computer automatically recognizes the meaning of the sentence. [2]
Big data processing technology is to perform distributed computing on data by using a big data processing computing framework. Because the amount of data on the Internet is quite large, it is necessary to use big data processing technology to improve the efficiency of data processing. In search engines, big data processing technology is mainly used to perform data calculations such as scoring the importance of web pages. [2]
Data mining is to use automatic or semi-automatic modeling algorithms from a large amount of data to find the information hidden in the data. It is the process of finding knowledge from the database. Data mining is generally related to computer science, and knowledge mining is achieved through methods such as machine learning, pattern recognition, and statistics. Text search is mainly used in search engines. Searching for text information requires understanding of human natural language. Text mining refers to extracting hidden, unknown, and possibly useful information from a large amount of text data. [2]

Search engines face problems

Webpage timeliness: There are many users on the Internet, and data sources are very wide. Webpages on the Internet change dynamically in real time. Updates and deletions of webpages are very frequent. Sometimes, newly updated webpages appear before the crawler program is too late. When the crawling has been deleted, this will greatly affect the accuracy of search results. [2]
Big data storage problem: The amount of data captured by the crawler after preprocessing is still quite large, which brings considerable challenges to the big data storage technology. Most search engines currently use structured databases to store data. Structured databases store data with high sharing and low redundancy. However, because structured databases are difficult to query concurrently, query efficiency is limited. . [2]
Reliability of search results: Currently, the accuracy of data processing has not reached the ideal level due to the limitations of data mining technology and computer hardware. Moreover, some individuals or companies use existing vulnerabilities in search engines to interfere with search results through cheating to cause search results. There may be a loss of reliability. [2]

Search engine development trends

1. Social search
Social network platforms and applications occupy the mainstream of the Internet. Social network platforms emphasize the connection and interaction between users, which poses new challenges to traditional search technologies. [3]
Traditional search technology emphasizes the relevance of search results and user needs. In addition to relevance, social search adds an additional dimension, namely the reliability of search results. For a certain search result, there may be tens of thousands of traditional results, but if the information, reviews or verified information posted by other users in the user's social network is easier to trust, this is closely related to the user's mind. Social search provides users with more accurate and trusted search results. [3]
2.Real-time search
Increasing demands on the real-time nature of search engines are also a development direction for search engines in the future. [3]
The most prominent feature of real-time search is its timeliness. More and more emergencies are posted on Weibo for the first time. The core of real-time search is "fast", and the information posted by users can be searched by search engines as soon as possible. However, in China, real-time search cannot be widely used due to various reasons. For example, Google s real-time search has been reset, and Baidu has no obvious real-time search entry. [3]
3.Mobile search
With the rapid development of smart phones, mobile-based mobile device searches are becoming increasingly popular, but mobile devices have significant limitations, such as the screen is too small, the area that can be displayed is not large, the computing resources are limited, and the speed of opening web pages is very slow. Problems such as tedious phone input need to be solved. [3]
At present, with the rapid popularization of smartphones, mobile search will certainly develop more rapidly, so the market share of mobile search will gradually increase. For websites without a mobile version, Baidu also provides the "Baidu Mobile Open Platform" To make up for this lack. [3]
4. Personalized search
Personalized search mainly faces two problems: how to build a user's personal interest model? How to use this personal interest model in search engines? [3]
The core of personalized search is to establish an accurate personal interest model based on the user's network behavior. To establish such a model, it is necessary to collect user-related information, including user search history, click history, web pages viewed, user E-mail information, favorites information, information posted by users, blogs, and Weibo And so on. It is more common to extract keywords and their weights from this information. Providing personalized search results for different users is the general development trend of search engines, but the existing technology has many problems, such as the leakage of personal privacy, and the user's interests will continue to change, relying too much on historical information may not reflect the user Interest changes. [3]
5.Geographically aware search
At present, many mobile phones already have GPS applications. This is a location-aware search, and users can sense the user's orientation through gyroscopes and other devices. Based on this information, users can be provided with accurate geographic location services and related search services . At present, such applications have become popular, such as mobile map apps. [3]
Cross-language search
How to translate Chinese user queries into English queries, there are currently three main methods: machine translation, bilingual dictionary query and bilingual corpus mining. For a global search engine, having a cross-language search function is an inevitable development trend, and its basic technical route generally uses two technical methods: query translation and machine translation of web pages. [3]
7.Multimedia search
Currently, search engine queries are text-based, and even image and video searches are text-based. Then future multimedia search technologies will make up for this lack of queries. In addition to text, multimedia forms mainly include pictures, audio, and video. Multimedia search is much more complicated than plain text search. General multimedia search includes four main steps: multimedia feature extraction, multimedia data stream segmentation, multimedia data classification, and multimedia data search engine. [3]
8.Scenario search
Contextual search is a product that integrates multiple technologies. The social search, personalized search, and location-aware search introduced above all support contextual search. At present, Google strongly advocates this concept. The so-called contextual search is to be able to perceive the environment of people and people, and to build a model for "here and now this person" in an attempt to understand the purpose of user query, the fundamental goal is to understand the information needs of people. For example, a user sends a search request for "Apple" near an Apple store. Based on location awareness and the user's personalized model, the search engine may consider this query to be directed to Apple products, not demand for fruit. [3]

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?