Walter L. Warnick, Ph.D., Director
Office of Scientific and Technical Information, U.S. Department of Energy
The Stage Is Set for the Future
To advance science and sustain technological creativity by making R&D findings available to Department of Energy researchers and the American public.
Information fuels discovery.
Superior access to quality information speeds discovery.
Advancing Science Discovery: From the '40s to the Future
From 1947 to 2007 � from Nuclear Science Abstracts to WorldWideScience.org � mission accomplished!
Whether by print or by pixel, OSTI has long been committed to ensuring appropriate access to research results
OSTI�s creation 60 years ago signified a sea change from the Secret City of the Manhattan Project toward an openness to share S&T knowledge with the public.
Of course, we continue to this day to keep secret all the information that has military applications. Whether peaceful or national security related, the S&T legacy of this agency is captured in this building.
Science Progresses as Knowledge Is Shared
OSTI corollary: If the sharing of knowledge � or knowledge diffusion � is accelerated, scientific progress is accelerated.
Science can be advanced by hiring more researchers and giving them better equipment; and science can be advanced by accelerating the sharing of knowledge.
We consciously seek to exploit new technology to accelerate the spread of scientific and technical knowledge.
Larry Page, speaking to scientists, AAAS 2007
"Virtually all economic growth (in the world) was due to technological progress. I think as a society we're not really paying attention to that."
He called on the scientists to make more of their research available digitally. �We have to unlock the wealth of scientific knowledge and get it to everyone.�
The stage is set for the future
We are ready to scale up our efforts in metasearch, or federated search. Simply put, we intend to make science searchable via one portal.
We must ensure access to science information that is Non-Googleable
Google: v., to search for information through Google
Googleable: adj., information found by Googling
Non-Googleable: adj., information that cannot be found by Googling
True or False?
Most useful information is available via familiar search engines such as Google and Yahoo!
The vast majority of science information in databases is not crawled by popular search engines.
Scientific databases stump Google
Systems that crawl the Web do not typically reach below the surface.
Google �crawls� the surface Web, but scientific databases are largely found in the deep Web.
Google works to solve the problem, but there�s a better way ...
Google moves ahead with plan to open up federal Web sites.
Google is making strides on an initiative to make information stored on public government Web sites more accessible to people looking for it, but challenges remain, officials with the search engine company said Wednesday. Three federal organizations recently agreed to structure their sites to make them accessible for nearly all Internet searches, the officials said. Information on the Plain Language Web site aimed at eliminating jargon in government communications, and on sites by the Energy Department's Office of Scientific and Technical Information and the Education Department's National Center for Education Statistics, has been opened up to the three most popular search engines: Google, Yahoo and MSN.
Federated search drills down to the deep Web where scientific databases reside.
We need systems that probe the deep Web.
Unlike the Google solution, federated search places no burden on the database owners.
Federated search yields one-stop portals
Science.gov 50 million pages
ScienceAccelerator Key DOE databases
WorldWideScience.org 200 million pages
19 sources, 17 countries, all inhabited continents
Harvesting and federated search are useful when full bibliographical control is not feasible.
Analogous to Google � crawls and mines data that does not reside in databases.
Different from Google � directed, selective crawling.
Federated Search: Advantages
• Current, real-time results
• No burden for database owner
• Inexpensive to implement
• No need-to-know for user
• No searching door-to-door
• Allows for fielded searching
• Interoperability is automatically achieved
• Federated search has limitations
• Neither crawling nor federated search is a panacea
• Federated searching does things crawling cannot do, and vise versa. They are complementary technologies
• Federated searching has advanced rapidly and should continue to advance
Science as a noble enterprise