Friday, November 7, 2008

Week 10 - Web Search Engines Part 1 and Part 2

I thought this two part article gave a mostly easy to understand overview of how search engines work. There were a few highly technical concepts in part 2 that I'm not quite clear on (for example, it wasn't clear to me how the engines identify what a term is) but I was able to understand the basics of the hardware needed and database design required for search engine creation and optimal performance.

Clearly good redundant hardware configuration as well as index design are critical to efficient performance for a search engine. I had no idea that these servers numbered in the hundreds of thousands. After reading through the issues encountered in indexing - spamming and cloaking attempts, dead links, outdated information, secured information - I can understand why no one attempts to index all the content. It's absolutely incredible that these search engines work as well as they do given the formidable challenges that exist in organizing the information.

No comments: