What Is a Search Data Structure?
A search engine usually refers to a full-text search engine that collects tens to billions of web pages on the World Wide Web and indexes each word (that is, a keyword) in the web pages to establish an index database. When a user searches for a certain keyword, all web pages that contain the keyword in the page content will be searched out as search results. After sorting by a complex algorithm (or including commercialized bidding ranking, commercial promotion, or advertising), these results will be ranked in order according to their relevance to the search keywords (or have nothing to do with relevance).
- 1,
- The working principle of a search engine is to crawl web pages from the Internet, build an index database, and search and sort in the index database. Its entire working process is roughly divided into four parts: information collection, information analysis, information query and user interface. Information collection is when a network robot scans a website within a certain IP address range and traverses the Web space to collect web page data. To ensure that the collected data is up-to-date, the web robot will return to the web pages that have been crawled; A program that extracts index entries from the collected information and expresses them by index entries
- Inverted is one of the data structures commonly used by search engines. Inverted index refers to the use of the non-primary attribute value (also called the secondary key) of a record to find a record. The organized file is called an inverted file, that is, a secondary index. The inverted file includes all the secondary key values, and lists all the primary key values of the records related to it. It is mainly used for complex queries. Unlike traditional SQL queries, search engines often require an efficient data structure to provide external retrieval services during the pre-processing stage of the search engine's collection of data. The most effective data structure currently available is an "inverted file". A simple inverted file can be defined as a structure that uses the keywords of the document as the index and the document as the index target (similar to ordinary books, the index is the keyword, and the page of the book is the index target).