U.S. Department of Energy Office of Science Office of Scientific and Technical Information

The Science Knowledge Imperative: Making Non-Googleable Science Findable


The Science Knowledge Imperative: Making
Non-Googleable Science Findable
Dr. Walter L. Warnick
United States Department of Energy
Office of Scientific & Technical Information
Computers in Libraries Conference
Arlington, Virginia
March 30, 2009



Computers in Libraries 2009 Theme:
"Creating Tomorrow: Spreading Ideas & Learning"

• Science progresses as knowledge is shared

• OSTI Corollary: Accelerating the sharing of knowledge accelerates the advancement of science


But before we can accelerate the sharing of knowledge…

…we must dispel the misperception that popular search engines are already doing the job


Much of Science Is Non-Googleable

The Deep Web Is Huge

In fact, the vast majority of science information is in databases within the deep web - or the non-Googleable web - where popular search engines cannot go.

We in the information business need to recognize this gap between availability and need, and seize the opportunity to… .

Provide science information consumers with better tools.


The Web Is Transformational Technology for Sharing Knowledge

The web is still young and will certainly hold surprises as it evolves.

Just as another well-known transformational technology held surprises… .


Eclipsing Current Search Technology

Google is capitalizing on this early era of web technology and is hugely successful, powering more than half the world's searching.

But we must remember that we are just in the beginning of this transformation. Further technological transformations may very well eclipse today’s search technology!

A new, promising technology now emerging: federated search.


Federated search drills down to the deep web where scientific databases reside.

Surface Web

Deep Web Databases

Unlike the Google sitemap protocol solution, federated search places no burden on the database owners.

We need systems, such as federated search, that probe the deep web.


OSTI has recognized the need to bridge this gap; our emerging solution is "federated" search.

• Interfaces similar to Google

• Under the hood, NOT like Google

• Science Accelerator: Integrates Key DOE databases

• Science.gov: 200 million pages of science information issued by 14 U.S. science agencies

• WorldWideScience.org - the gateway to science information issued by approximately 60 Nations

• 375 million pages of global science information


OSTI Ensures Access to Non-Googleable Science

Volume of Content Made Searchable by OSTI

WorldWideScience.org: 375,000,000 pages of Global Scientific and Technical Information (STI): These web-available pages would fill 62,000 traditional 2-feet deep file drawers.

Science.gov: 200,000,000 pages of U.S. Government STI: These web-available pages would fill 33,000 traditional 2-feet deep file drawers.

STIP Collection: 10,000,000 pages of U.S. Department of Energy STI: These web-available pages would fill 1,600 traditional 2-feet deep file drawers.

Through OSTI products, librarians, researchers and the public can access a science page count comparable to, but not duplicative of, Google's entire science content.


Is There a Better Solution to a High Quality Science Search Tool Just Over the Horizon?

We Think So…

Live Federated Search Tools


Crawled Indexes

For Example: Combining WorldWideScience.org and crawled indexes


The Stage Is Set for the Future

• A billion-page, high quality science search tool may soon be available to spread ideas, increase learning, and further accelerate the progress of science.

• We are ready to scale up our efforts in federated search.

• Simply put, we intend to make more science accessible to more people than anyone has done before.