The Science Knowledge Imperative: Making
Non-Googleable Science Findable
The Science Knowledge Imperative: Making
Non-Googleable Science Findable
Dr. Walter L. Warnick
Director
United States Department of Energy
Office of Scientific & Technical Information
Computers in Libraries Conference
Arlington, Virginia
March 30, 2009
walter.warnick@science.doe.gov
Computers in Libraries 2009 Theme:
"Creating Tomorrow: Spreading Ideas & Learning"
• Science progresses as knowledge is shared
• OSTI Corollary: Accelerating the sharing of knowledge accelerates the advancement of science
But before we can accelerate the sharing of knowledge…
…we must dispel the misperception that popular search engines are already doing the job
Much of Science Is Non-Googleable
The Deep Web Is Huge
In fact, the vast majority of science information is in databases within the deep web - or the non-Googleable web - where popular search engines cannot go.
We in the information business need to recognize this gap between availability and need, and seize the opportunity to… .
Provide science information consumers with better tools.
The Web Is Transformational Technology for Sharing Knowledge
The web is still young and will certainly hold surprises as it evolves.
Just as another well-known transformational technology held surprises… .
Eclipsing Current Search Technology
Google is capitalizing on this early era of web technology and is hugely successful, powering more than half the world's searching.
But we must remember that we are just in the beginning of this transformation. Further technological transformations may very well eclipse today’s search technology!
A new, promising technology now emerging: federated search.
Federated search drills down to the deep web where scientific databases reside.
Surface Web
Deep Web Databases
Unlike the Google sitemap protocol solution, federated search places no burden on the database owners.
We need systems, such as federated search, that probe the deep web.
OSTI has recognized the need to bridge this gap; our emerging solution is "federated" search.
• Interfaces similar to Google
• Under the hood, NOT like Google
• Science Accelerator: Integrates Key DOE databases
• Science.gov: 200 million pages of science information issued by 14 U.S. science agencies
• WorldWideScience.org - the gateway to science information issued by approximately 60 Nations
• 375 million pages of global science information
OSTI Ensures Access to Non-Googleable Science
Volume of Content Made Searchable by OSTI
WorldWideScience.org: 375,000,000 pages of Global Scientific and Technical Information (STI): These web-available pages would fill 62,000 traditional 2-feet deep file drawers.
Science.gov: 200,000,000 pages of U.S. Government STI: These web-available pages would fill 33,000 traditional 2-feet deep file drawers.
STIP Collection: 10,000,000 pages of U.S. Department of Energy STI: These web-available pages would fill 1,600 traditional 2-feet deep file drawers.
Through OSTI products, librarians, researchers and the public can access a science page count comparable to, but not duplicative of, Google's entire science content.
Is There a Better Solution to a High Quality Science Search Tool Just Over the Horizon?
We Think So…
Live Federated Search Tools
+
Crawled Indexes
For Example: Combining WorldWideScience.org and crawled indexes
The Stage Is Set for the Future
• A billion-page, high quality science search tool may soon be available to spread ideas, increase learning, and further accelerate the progress of science.
• We are ready to scale up our efforts in federated search.
• Simply put, we intend to make more science accessible to more people than anyone has done before.


