The working principle of the search engine-CodePudding

Is sent over the web web search engine spiders, search engine is a automatic program used to crawl and scraping of the page, in the system background non-stop on the Internet to crawl, each node in the process of crawling discovery and scraping of the page as fast as possible, like bees, again after fetching in the original database,
Raw data in the database to after pretreatment of six steps: extraction of text, Chinese word segmentation de-noising, remove duplicate page, page importance calculation, index, analysis of link, and submit to the index database, etc., text extraction, will need to copy duplicate text, this process is like the original database to the index database assembly line work is irreversible, to pass before from the original database to the index database index, its function is to understand the original database collected information of web pages, and convenient way to access the object contains a member of the array or collection to the index database,
Indexer according to users on baidu search keywords, quickly retrieve documents in the index database, relevance evaluation and sorting the results of will output, according to the request of the user's query and reasonable accurate feedback information, meet the needs of customers on the web page search, this is what I think of the principle of search engines work,

CodePudding user response:

Do you match included