The success of Google has been so profound that the word “Google” is now considered a verb. http://en.wikipedia.org/wiki/Google_(verb) “To Google” has come to mean to search the web via the free search engine provided by Google, Inc. The adjective derived from the verb “Google” is “Googleable.” Similarly, the antonym of “Googleable” is “non-Googleable,” which turns out to be an especially useful word. For most practical purposes, the term “non-Googleable” is synonymous with the phrase “deep web.” The major difference between the word and the phrase in a world where Google, Inc., is the largest capitalized company is that the term non-Googleable is intuitively understood.
Anyway, it is generally acknowledged among students of the web that the bulk of the information in it is non-Googleable, a fact which typically comes as a surprise to people who do not study the web. In particular, the information residing in databases is often non-Googleable, and it often happens that scieintific and technical information resides in databases of documents.
The reason that databases are typically non-Googleable becomes clear once one considers how search engines like Google, Yahoo!, Ask.com and Bing acquire the content they search. The search engines rely upon crawlers to visit web pages well in advance of a patron’s search. The crawler creates an index of each page it visits and then follows hyperlinks on each page to find new pages to index. Typically, crawlers are flummoxed by the front page of a databases because such pages typically do not offer hyperlinks to the database content. Thus, crawlers used by companies like Google, Inc., typically cannot get past the front page of a database, leaving the database content non-Googleable.
There is one exception, which gets complicated, so if you are not a knowledge-management enthusiast, please just skip ahead to the next paragraph. The situation becomes complex when...Read more...