Raw data in the database to after pretreatment of six steps: extraction of text, Chinese word segmentation de-noising, remove duplicate page, page importance calculation, index, analysis of link, and submit to the index database, etc., text extraction, will need to copy duplicate text, this process is like the original database to the index database assembly line work is irreversible, to pass before from the original database to the index database index, its function is to understand the original database collected information of web pages, and convenient way to access the object contains a member of the array or collection to the index database,
Indexer according to users on baidu search keywords, quickly retrieve documents in the index database, relevance evaluation and sorting the results of will output, according to the request of the user's query and reasonable accurate feedback information, meet the needs of customers on the web page search, this is what I think of the principle of search engines work,
CodePudding user response:
Do you match included