Slide 1: OSTI—Advancing Scientific
April 8, 2008
Dr. Warnick: It is an honor to be the Keynote speaker.
Everyone here is familiar with the Defense Technical Information Center (DTIC). OSTI is the DTIC of the Department of Energy.
Slide 2: OSTI Mission
To advance science and sustain technological creativity by making R&D findings available and useful to DOE researchers and the American people
The OSTI mission reflects the mission and audience of our parent agency, the Department of Energy.
Slide 3: Science Progresses as Knowledge Is Shared
OSTI Corollary: If the sharing of knowledge is accelerated, discovery is accelerated
Profound implications for all of us in the information business!
Every scientist will agree that science progresses only if knowledge is shared.
Building on this thought, we have coined the OSTI Corollary, which holds that if the sharing of knowledge is accelerated, discovery is accelerated.
Notice that we are accelerating discovery. Discovery powers the growth of our prosperity, discovery improves people’s lives, and discovery strengthens our national defense. All these outcomes will be realized sooner if we do our jobs faster and better.
Thus, the OSTI Corollary has profound implications for all of us in the information business. The prospect of accelerating discovery animates folks at OSTI. OSTI is not just an organization with a mission. It is an organization on a mission.
Slide 4: A Key Piece of Science Discovery
Information feeds discovery
If you were to ask a senior official in an R&D agency how one might accelerate science, they would say, "Hire more researchers, build new and better facilities, buy more powerful computers." I am here to tell you that there is another way to accelerate science. And that is to accelerate the sharing of scientific and technical knowledge.
Slide 5: The Spread of Knowledge Can Be Measured
The Spread of Knowledge about Feynman Diagrams
Discovery path of US and UK authors
From: The Power of a Good Idea: quantitative modeling of the spread of ideas from epidemiological models, Luis M. A. Bettencourt, Ariel Cintron-Arias, Carlos Castillo-Chavez; David Kaiser, May 2005
To help us understand more about accelerating knowledge, we do research, albeit on a tiny scale. We first note that the spread of knowledge can be measured. Each triangle in this graph is an author in the US or the UK who wrote about Feynman diagrams after they were first described by Nobel Laureate Richard Feynman in 1949.
Slide 6: The Spread of Knowledge Can Be Modeled
Path of Best Trajectory
From Report for the Office of Scientific and Technical Information: Population Modeling of the Emergence and Development of Scientific Fields; Luis M. A. Bettencourt, Carlos Castillo-Chavez, David Kaiser, David E. Wojick, October, 2006
We then set out to model the spread of knowledge. We found that existing diffusion models (originally developed to model the spread of disease) fit the data. A key parameter of the model is "the contact rate."
Slide 7: The Spread of Knowledge Can Be Accelerated
Paths of Acceleration
Bettencourt, Castillo-Chavez, Kaiser, Wojick
By increasing the contact rate, the spread of knowledge is accelerated. For example, the solid red line envisions a doubling of the contact rate.
Slide 8: OSTI’s Creed
Knowledge is contagious, and it's our job to make sure everyone "catches" it easier and quicker!
To that end, we intend to study factors that determine the rate at which researchers will "catch" an idea if the contact rate between scientists is increased
Slide 9: But before we can accelerate the sharing of knowledge ...
...we must dispel the misperception that popular search engines are already doing the job.
It is too little appreciated that Google overlooks a large fraction of the web. Several scholars have estimated the faction of the web that is made searchable via Google. The largest estimate I have seen is 10 percent. That implies that Google overlooks 90 percent of the web.
Other estimates are that Google captures only 1 percent of the web, or less. That means that Google overlooks 99 percent of the web.
It is a popular misperception that popular search engines do a thorough job making the web searchable.
Slide 10: Web Is Transformational Technology for Sharing Knowledge
Web is still young, and will certainly hold surprises as it evolves; just as another well-known transformational technology held surprises ...
Just as the web first captured public imagination about 1994, the automobile first captured public imagination about 1903 when Henry Ford introduced his first mass produced car. In the years that followed, Ford made very significant technological progress. Fourteen years later, in 1917, Ford offered electric lights, secure doors, a roof, and numerous improvements under the hood. We have now reached the fourteenth birthday of the web, and it, too, has evolved dramatically. We now have enormous content, better browsers, search engines, and databases on the web which can be federated. And just as the 1917 Ford transformed lives, the 2008 web is transforming lives, too.
As we all know, Ford's technology was not static in 1917. The automobile continued to evolve, and indeed continues to evolve today. Similarly, web technology is anything but static. Changes are happening so fast that, while we can be pretty well assured that the web will continue to exist a few years from now, no one can reliably predict what it will be like. We are in the Model T era of the web where rapid change is to be expected. Current newspaper headlines point out that the internet, which makes the web possible, may itself become obsolete.
We may not be able to predict what the Web will look like in a few years, but we can learn lessons from history. Ford made the automobile more user-friendly, faster, easier to operate, and most importantly, Ford made the automobile ubiquitous. Just as Ford sensed the burden of transportation was an obstacle to human progress, we know that the burden of searching is an obstacle to science progress. Ford transformed the behavior of the traveling public. We at OSTI are transforming the behavior of the research scientist and science attentive citizen. We at OSTI are doing everything possible to make use of the evolving Internet to diffuse knowledge related to our agency mission.
Slide 11: Eclipsing Current Search Technology
Google is capitalizing on this early era of web technology and is hugely successful, powering more than half the world’s searching
But we must remember that we are just in the beginning of this transformation. Further technological transformations may very well eclipse today’s search technology!
A new, promising technology now emerging: federated search
The technology used by Google and Yahoo does most of its work in anticipation of users doing searches. Their technology relies upon crawlers, which are computers that find and visit web sites one at a time, typically by following hyperlinks. Each time the crawler finds a page, it indexes it, which might be considered something akin to making an alphabetical list of words on that page. This index is then merged with a master index compiled from all the pages visited by the crawler. This is all in preparation for visits by web customers like you and me. When we finally do a search on Google, our query is actually applied against the master index. When there is a match, Google informs us of the hyperlink to the page that it indexed sometime in the past.
Slide 12: Much of Science Is Non-Googleable
In fact, the vast majority of science information is in databases within the deep Web – or the non-Googleable Web – where popular search engines cannot go.
We in the information business need to recognize this gap between availability and need, and seize the opportunity to... provide science information consumers with better tools.
The bulk of science information, especially scholarly science information, resides in databases. Crawlers can get to the first page of a database, but, typically, they cannot get past the front page. The databases search box is often the only systematic way to see the contents of the database, and the crawler cannot deal with that search box.
The part of the web that is off limits to crawlers is called the deep Web.
It is possible for database owners to take special steps to expose database content to Google crawlers. Indeed, DTIC and OSTI have exposed much of their database content. But, the simple fact of the matter is, that most database owners do not take this step.
Slide 13: Surface Web
Deep Web databases
Federated search drills down to the deep Web where scientific databases reside.
Unlike the Google sitemap protocol solution, federated search places no burden on the database owners.
We need systems, such as federated search, that probe the deep Web.
Federated search is a different kind of web search architecture. When the user places a query on a federated search application, like Science.gov, the query is transmitted to the server that hosts the application. The server than translates the query so that is understandable by a host of remote databases, and then transmits the translated queries.
The remote databases execute the search, and report back the results to the server that hosts the application.
The host server then combines the hits from all the databases, and sorts them in relevance rank order. Finally, the ranked list is returned to the user. The whole process can take anywhere from about a second to a couple of seconds.
Slide 14: OSTI has recognized the need to bridge this gap; our emerging solution is “federated” search.
Science.gov: 50 million pages of federal science information from 13 U.S. science agencies (including DoD – DTIC databases are part of Science.gov)
ScienceAccelerator: Key DOE databases
Our most recent federated search engine is WorldWideScience.org – the global science gateway (including DoD – DTIC databases are part of Science.gov).
Recognizing the opportunity to advance the OSTI mission back in the 1990s, OSTI set out to capitalize on it as quickly as resources would allow by producing web applications to disseminate all manner of scientific and technical information (STI). A steady progression of new OSTI products addressed the various forms of STI: technical reports, e-prints, conference proceedings, accomplishments, patents, and project descriptions . To make it easy for users who want to search through all these products at once, we introduced the DOE Science Accelerator, which is powered by our special web architecture called federated search.
Reaching out beyond DOE, we initiated a collaboration with other agencies to allow users to search their R&D results along with DOE's; thus emerged Science.gov. DTIC databases are part of Science.gov.
Most recently, we took collaboration world wide by federating the best information sources from governments around the world, WorldWideScience which makes searchable about the same quantity of science as does Google. The US contribution to WorldWideScience is Science.gov.
Slide 15: International partnership kicks off global science gateway
In January 2007, Dr. Raymond Orbach, DOE Under Secretary for Science, and Lynne Brindley, Chief Executive of the British Library, signed a Statement of Intent to partner in the development of a searchable global science gateway.
WorldWideScience is OSTI’s largest application, by far.
Slide 16: WorldWideScience.org was launched in June 2007 and now searches 32 portals from 44 countries.
WorldWideScience.org enables access to prominent as well as smaller, less well-known sources of highly valuable science.
WorldWideScience.org allows users to search multiple data sources around the globe from a single query search box.
Slide 17: What Is WorldWideScience.org?
- A federation of the leading science portals sponsored by the governments and national institutions of 44 countries
- A quantity of science (more than 200 million pages from every inhabited continent) searched comparable to that searchable via Google, with the bulk of the science being non-Googleable
- A contrast to content searched by Google – WWS.org content tends to be scholarly
- A breakthrough in content enabled by breakthrough technology
Current National Partners in WorldWideScience.org
Slide 19: Current National Partners in WorldWideScience.org (cont.)
Slide 20: Current WorldWideScience.org Sources
- African Journals Online
- Article@INIST (France)
- Australian Antarctic Data Centre
- Canada Institute for Scientific and Technical Information
- CSIR Research Space (South Africa)
- Defence Research and Development Canada (Canada)
- DEFF Global E Prints (Denmark)
- DEFF Research Database (Denmark)
- Directory of Open Access Journals (Sweden)
- Electronic Table of Contents (ETOC) (United Kingdom)
- Indian Academy of Sciences
- Indian Institute of Science Eprints
- Indian Institute of Science Theses & Dissertations
- Indian Medlars Centre
- J-EAST (Japan)
- J-STAGE (Japan)
- J-STORE (Japan)
- Journal@rchive (Japan)
- Korea Science (Korea)
- NARCIS (Netherlands)
- Science.gov (United States)
- Scientific Electronic Library Online (Argentina, Brazil, Chile, Colombia, Portugal, Spain)
- Transactions and Proceedings of the Royal Society of New Zealand 1868-1961 (New Zealand)
- UK PubMed Central (United Kingdom)
- Vascoda (Germany)
- VTT Technical Research Centre of Finland Publications Register
- VTT Technical Research Centre of Finland Research Register
Slide 21: WorldWideScience.org Search
Here's how it works . . .
Slide 22: WorldWideScience.org Search
Type a search term such as "weather prediction" into the search box and click "Search".
Slide 23: WorldWideScience.org Search
With just one click, your query will be sent from the server in Oak Ridge, TN, to databases around the globe.
Slide 24: WorldWideScience.org Search
Slide 25: WorldWideScience.org Search
Your query will return to your desktop from the server in Oak Ridge, TN, with a list of results in "real time."
Slide 26: The stage Is Set for the Future
You get up-to-the-minute information relevant to your query.
Slide 27: The stage Is Set for the Future
We are ready to scale up our efforts in federated search. Simply put, we intend to make more science accessible to more people than anyone has done before.