Find DOE Collections
Much of Science is Non-Googleable: An Emerging Solution
Walter L. Warnick, Ph.D., Director
Office of Scientific and Technical Information, U.S. Department of Energy
The Stage Is Set for the Future
Slide 2: OSTI Mission
To advance science and sustain technological creativity by making R&D findings available to Department of Energy researchers and the American public.
Information fuels discovery.
Superior access to quality information speeds discovery.
Slide 3: Advancing Science Discovery: From the '40s to the Future
From 1947 to 2007 – from Nuclear Science Abstracts to WorldWideScience.org – mission accomplished!
Whether by print or by pixel, OSTI has long been committed to ensuring appropriate access to research results.
OSTI’s creation 60 years ago signified a sea change from the Secret City of the Manhattan Project toward an openness to share S&T knowledge with the public.
Of course, we continue to this day to keep secret all the information that has military applications. Whether peaceful or national security related, the S&T legacy of this agency is captured in this building.
Slide 5: Science Progresses as Knowledge Is Shared
OSTI corollary: If the sharing of knowledge – or knowledge diffusion – is accelerated, scientific progress is accelerated.
Science can be advanced by hiring more researchers and giving them better equipment; and science can be advanced by accelerating the sharing of knowledge.
We consciously seek to exploit new technology to accelerate the spread of scientific and technical knowledge.
Slide 7: Larry Page, speaking to scientists, AAAS 2007
"Virtually all economic growth (in the world) was due to technological progress. I think as a society we're not really paying attention to that."
He called on the scientists to make more of their research available digitally. “We have to unlock the wealth of scientific knowledge and get it to everyone.”
Slide 8: The stage is set for the future
We are ready to scale up our efforts in metasearch, or federated search. Simply put, we intend to make science searchable via one portal.
Slide 9: We must ensure access to science information that is Non-Googleable
Google: v., to search for information through Google.
Googleable: adj., information found by Googling.
Non-Googleable: adj., information that cannot be found by Googling.
Slide 10: True or False?
Most useful information is available via familiar search engines such as Google and Yahoo!
The vast majority of science information in databases is not crawled by popular search engines.
Slide 11: Scientific databases stump Google
Systems that crawl the Web do not typically reach below the surface.
Google “crawls” the surface Web, but scientific databases are largely found in the deep Web.
Google works to solve the problem, but there’s a better way ...
Google moves ahead with plan to open up federal Web sites Google is making strides on an initiative to make information stored on public government Web sites more accessible to people looking for it, but challenges remain, officials with the search engine company said Wednesday. Three federal organizations recently agreed to structure their sites to make them accessible for nearly all Internet searches, the officials said. Information on the Plain Language Web site aimed at eliminating jargon in government communications, and on sites by the Energy Department's Office of Scientific and Technical Information and the Education Department's National Center for Education Statistics, has been opened up to the three most popular search engines: Google, Yahoo and MSN.
Federated search drills down to the deep Web where scientific databases reside.
We need systems that probe the deep Web.
Unlike the Google solution, federated search places no burden on the database owners.
Slide 14: Federated search yields one-stop portals
Science.gov 50 million pages.
ScienceAccelerator Key DOE databases.
WorldWideScience.org 200 million pages.
19 sources, 17 countries, all inhabited continents.
Slide 15: Harvesting
Harvesting and federated search are useful when full bibliographical control is not feasible.
Analogous to Google – crawls and mines data that does not reside in databases.
Different from Google – directed, selective crawling.
Slide 16: Federated Search: Advantages
- Current, real-time results
- No burden for database owner
- Inexpensive to implement
- No need-to-know for user
- No searching door-to-door
- Allows for fielded searching
- Interoperability is automatically achieved
Slide 17: Additional Points
- Federated search has limitations
- Neither crawling nor federated search is a panacea
- Federated searching does things crawling cannot do, and vise versa. They are complementary technologies
- Federated searching has advanced rapidly and should continue to advance