The success of Google has been so profound that the word “Google” is now considered a verb. http://en.wikipedia.org/wiki/Google_(verb) “To Google” has come to mean to search the web via the free search engine provided by Google, Inc. The adjective derived from the verb “Google” is “Googleable.” Similarly, the antonym of “Googleable” is “non-Googleable,” which turns out to be an especially useful word. For most practical purposes, the term “non-Googleable” is synonymous with the phrase “deep web.” The major difference between the word and the phrase in a world where Google, Inc., is the largest capitalized company is that the term non-Googleable is intuitively understood.
Anyway, it is generally acknowledged among students of the web that the bulk of the information in it is non-Googleable, a fact which typically comes as a surprise to people who do not study the web. In particular, the information residing in databases is often non-Googleable, and it often happens that scieintific and technical information resides in databases of documents.
The reason that databases are typically non-Googleable becomes clear once one considers how search engines like Google, Yahoo!, Ask.com and Bing acquire the content they search. The search engines rely upon crawlers to visit web pages well in advance of a patron’s search. The crawler creates an index of each page it visits and then follows hyperlinks on each page to find new pages to index. Typically, crawlers are flummoxed by the front page of a databases because such pages typically do not offer hyperlinks to the database content. Thus, crawlers used by companies like Google, Inc., typically cannot get past the front page of a database, leaving the database content non-Googleable.
There is one exception, which gets complicated, so if you are not a knowledge-management enthusiast, please just skip ahead to the next paragraph. The situation becomes complex when...Read more...
On June 11, the Multilingual WorldWideScience.org BETA was officially launched in Helsinki, Finland at the International Council for Scientific and Technical Information (ICSTI) annual conference. This new capability is the result of an international public-private partnership between the WorldWideScience.org Alliance and Microsoft Research, whose translations technology has been paired with the federated searching technology of Deep Web Technologies.
WorldWideScience.org now provides the first-ever real-time searching and translation across globally-dispersed, multilingual scientific literature. Multilingual
WorldWideScience.orgBETA allows users to conduct a single query of over 70 scientific databases from around the world. Results can then be translated into the user’s preferred language. Currently, nine languages are available (Chinese, English, French, German, Japanese, Korean, Portuguese, Spanish, and Russian) and more languages will be added in the coming months. With the pace of non-English scientific publishing continuing to grow, it is vitally important that English-speaking scientists gain access to non-English content. Conversely, Multilingual WorldWideScience.orgBETA also benefits non-English-speaking users by enabling translations of English-language content.
Since its inception in 2007, WorldWideScience.org has grown from searching 12 databases in 10 countries to searching over 70 databases in 66 countries, covering more than 400 million pages of science. OSTI serves as the Operating Agent for WorldWideScience.org, and as the product manager, I have been enormously honored to lead this project over the past three years. From the beginning, the goal behind WorldWideScience.org has been to broaden access to the world’s scientific information and to facilitate the scientific discovery process. With each new database that has been added to WorldWideScience.org’s searches...Read more...