A search engine operates, in the following order: 1) Crawling; 2) Deep Crawling Depth-first search (DFS); 3) Fresh Crawling Breadth-first search (BFS); 4) Indexing; 5) Searching. Web search engines work by storing information about a large number of web pages, which they retrieve from the WWW itself. These pages are retrieved by a web crawler (also known as a spider) â an automated web browser which follows every link it sees, exclusions can be made by the use of robots.txt. The contents of each page are then analyzed to determine how it should be indexed. Data about web pages is stored in an index database for use in later queries. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the we View the rest of this article
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment