Accelerating Science Discovery - Join the Discussion

OSTIblog Articles in the federated search Topic

The Importance of Cross-fertilization of Ideas

Being someone who really loves mathematics I enjoy reading about the lives of mathematicians, about how they think, and about how they solve problems. And, as an OSTI consultant I recognize the value of having access to the ideas of others when performing research. As I read stories of the brilliant mathematicians, especially ones like Gauss, Fermat, and Pythagoras who lived hundreds of years ago, I wonder how much more they might have accomplished if each had access to all of the important works of all of his or her predecessors and contemporaries, not only in his or her specialized fields but in all of the natural sciences. The cross-fertilization of ideas, the unexpected connections between seemingly disjoint fields of science, that is critical to advancing science. Through a number of powerful products and services OSTI provides a tremendous foundation for this cross-fertilization.

I recently read "The Music of the Primes," by Marcus du Sautoy. The book tells a very fascinating story about the search for patterns among prime numbers. It turns out that this search is not just driven by intellectual curiosity. Important results that affect our lives are derived from our understanding of prime numbers. Alan Turing's success in cracking the German Enigma machine and the security of our electronic financial transactions come directly from our understanding of the primes. Error detecting and error correcting technology, so critical to modern electronic communication systems, owes its existence to the primes as well. Casting a wide net is critical to making those unexpected connections and to advancing science., the global gateway to science conceived at OSTI, accesses scientific and technical information from countries that include 73% of the world's population.

Du Sautoy's book chronicles the heroic efforts of...

Related Topics: federated search


Dr. Warnick Speaks at Computers In Libraries Annual Conference


Dr. Warnick Speaks at Computers In Libraries Annual Conference

Dr. Walt Warnick, Director of OSTI, recently had the honor of speaking at two events at the Computers In Libraries Conference. I asked Dr. Warnick to share some of his experience and perceptions from the talks through a short interview:

Dr. Warnick, you travel quite a bit and make numerous presentations about OSTI's innovative work. What drew you to speak at the Computers In Libraries (CIL) Conference?

I was invited to make two presentations which you describe below in your third question. I have visited the Conference in previous years, but this is the first time that I made presentations. Computers In Libraries is a natural forum for OSTI, as everything we do today is computer based and librarians are very important customers.

Congratulations! How would you categorize the attendees at CIL? Were they mostly librarians?

My impression is that most CIL attendees were librarians. Sprinkled among them were computer techies. For example, the moderator at my first session was a highly accomplished computer techie from the San Francisco Chronicle.

You had the distinction of co-leading two sessions at the conference, one on information dissemination, the other on the future of federated search as you see it. Let's start with the first. What was the gist of your message on how OSTI is spreading knowledge and advancing science?

My speech is posted at the OSTI site for speeches. It's titled The Science Knowledge Imperative: Making Non-Googleable Science Findable. The gist is that with very small investments enormous collections of science from government agencies and countries around the world have been virtually integrated...

Related Topics: Computers in Libraries, federated search, Walter Warnick


Growing Up With Federated Search

[Note: This article first appeared in the Federated Search Blog. ]

This is the story of how one organization of the Federal government came to recognize the potential of federated search and then set out to deploy it and encourage its maturation.

Along the way, considerable progress has been made. More science is freely findable on the web today than has ever before been available to the public. Yet, much more progress remains to be made.

Before the Web

Before the web, the Office of Scientific and Technical Information (OSTI) of the Department of Energy used the technology then available to maximize communication about the results of the Department's research and development program. For example, OSTI created microfiche and sent it to hundreds of depository libraries. It also partnered with on-line vendors like Dialog. Hard copies were made available via a partnership with the National Technical Information Service.

Enter the Web

With the advent of the web, it quickly became clear that the new medium offered tremendous potential to communicate science. Thus, OSTI set out to develop cutting-edge web tools to share e-prints, technical reports, conference proceedings, and other forms of scientific and technical information (STI). Because each form of STI comes from a distinct source, each form follows a distinct pathway which needed to be accommodated, which naturally led to a separate information product for each form.

The Need to Integrate Web Applications

Within a couple years, OSTI had developed a suite of web based databases and was also linking to similar databases offered by other agencies. It was apparent...

Related Topics: federated search


On Credibility of Search Results

On March 2nd I wrote an article for the Federated Search Blog: On credibility  of search results.

The article asserts that a federated search engine is only as good as the quality of the content to which it provides access. While the major consumer-oriented search engines may provide more search results overall, it is left to the user of the search application to sift through the search results to identify which content represents credible scientific and technical information.

OSTI doesn't suffer from this quality issue. As stewards for DOE research output, OSTI only disseminates the most credible information. Below are just a few examples of OSTI's commitment to providing only vetted quality information via its federated search applications. provides access to only credible information via federated search; empowers researchers and the science attentive citizen to search important information resources of the U.S. Department of Energy's (DOE's) scientific and technical information.

Beyond DOE, delivers 200 million pages of authoritative U.S. government science information from 14 federal agencies that participate in the Alliance.

At the global level, serves as a worldwide science portal to the most credible information from international government agencies and from organizations sanctioned by their governments.

As OSTI grows and the scientific and technical information we provide expands, the quality of information we provide will always remain high.

Sol Lederman
OSTI Consultant

Related Topics: federated search, Science Accelerator,, (WWS)


Pushing the Limits is Key to OSTI's Success

by Dr. Walt Warnick 09 Oct, 2008 in Technology

by Walt Warnick and Sol Lederman

While federated search is a core technology that OSTI employs to tackle challenges of sharing knowledge, the technology isn't perfect.  OSTI aggressively uses federated search because it does what no other search technology can do--inexpensively making dozens of non-Googleable databases searchable via a single query.  Two nagging limitations of federated search are that it can take 30 seconds to execute a search--which seems slow in the digital age--and hit lists are not exhaustive. 

To drive progress, we at OSTI are constantly striving to design affordable information systems that work as well as we can make them.  We call this the "art of the possible." At the same time, we confidently recognize that tomorrow's information technology will make our systems work better.  By deploying first-of-a-kind systems, we not only advance our mission, we also call attention to the need for new technology to address our limitations.  OSTI thus makes an important contribution to technological progress, even if it is not OSTI itself that develops tomorrow's technology.  By pushing the state-of-the-art today, we highlight needs and hasten the arrival of tomorrow's information technology.

History shows that an unwillingness to be deterred by the limitations of the day leads to ultimate success. Consider the case of the first general purpose electronic computer, ENIAC, unveiled in 1946:

ENIAC contained 17,468, vacuum tubes7,200 crystal diodes, 1,500 relays, 70,000 resistors, 10,000 ...

Related Topics: eniac, federated search


The Science Knowledge Imperative: Making non-Googleable Science Findable

Just as science progresses only if knowledge is shared, accelerating the sharing of knowledge accelerates science. All of us engaged in disseminating science knowledge have the opportunity and obligation to do our jobs better, for to do so accelerates science itself. 

To this end, I propose a grand challenge--to make more science available to, and searchable by, more people than ever before. A momentous milestone will be achieved once we enable everyone with web access the ability to search with unparalleled precision a billion pages of authoritative science. Already, considerable progress has been made.  

My organization, the U.S. Department of Energy (DOE) Office of Scientific and Technical Information (OSTI) is responsible for the scientific and technical information operations of the Department. Over the last 11 years, OSTI has become entirely web-based. Of course, we are just one among many entities who connect people to knowledge using the web. Most notably, Google, Yahoo!, and other conventional search engine providers do this, too.

Google and other conventional search engines do for the web what publishers have long done for books--they create an index so that customers can quickly find information. Web users value this service so highly that search companies have become phenomenally successful enterprises. 

However, an important misunderstanding has sprung up about Google and the others. That is, the false presumption, especially among young people, that most useful information is available via conventional search engines such as Google and Yahoo!

In fact, much of the information on the web is inherently unavailable to Google and Yahoo! This key limitation would come as a surprise to...

Related Topics: crawling, E-Print Network (EPN), federated search, google, science,, (WWS)


WorldWideScience - The One-Stop Global Gateway to National Science Portals

I would like to share news of groundbreaking proportion on the subject of accelerating scientific progress. On June 12, 2008, in Seoul, Korea, OSTI, along with national and international partners, formally established the WorldWideScience Alliance, a multilateral governance structure for the global science gateway (WWS).

First, let me provide a brief history.  As many of you know, over its 60+ year history, OSTI has built very large collections of energy-related scientific and technical information, emanating primarily from the work of DOE and its predecessor agencies.  We have made these collections available through our own sophisticated web products, and their popularity and use among scientists and science-attentive citizens is well documented - with 80 million transactions per year.

In a similar way, other U.S. federal science agencies and, indeed, other STI organizations around the world have built their own databases and other web products to provide electronic access to their own R&D results.  While these efforts address individual STI organizations' mandates to provide public access to their R&D information, such decentralized efforts have left the typical scientist/citizen in a dilemma - a dilemma which, we believe, actually impedes the rate of scientific progress.

The dilemma is that no single scientist can be expected to be aware of the hundreds of high-quality STI sources on the web.  Moreover, even if a person were aware of all of these sources, he or she simply wouldn't have the time to search them one-by-one to find the scientific knowledge that will help accelerate his or her own efforts.  And, finally, this scientist will not be able to find the...

Related Topics: federated search, icsti, kisti, (WWS)


Navigating Technological Transformation

by Dr. Walt Warnick 07 Apr, 2008 in Technology

Today, all of OSTI's information products are on the web. This is in sharp contrast to the situation as recently as the mid-1990s, when OSTI had no products on the web.

First becoming popular in 1994, the web quickly emerged as a transformational technology, and its potential for reshaping OSTI was apparent. Recognizing the opportunity to advance the OSTI mission, OSTI set out to capitalize on it as quickly as resources would allow by producing web applications to disseminate all manner of scientific and technical information (STI). A steady progression of new OSTI products addressed the various forms of STI: technical reports, e-prints, conference proceedings, accomplishments, patents, and project descriptions . To make it easy for users who want to search through all these products at once, we introduced the DOE Science Accelerator, which is powered by our special web architecture called federated search. Reaching out beyond DOE, we initiated a collaboration with other agencies to allow users to search their R&D results along with DOE's; thus emerged Most recently, we took collaboration world wide by federating the best information sources from governments around the world, WorldWideScience which makes searchable about the same quantity of science as does Google.

Over the years, OSTI has upgraded each of its products, so that, today, they offer more to users than ever before. Such upgrades are made possible...

Related Topics: federated search, osti


Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 3

by Sol Lederman 21 Mar, 2008 in Technology

This is the third, and final, article in a series. The first article provided an overview of the E-print Network. The second article discussed the special harvested component of the E-print Network in depth. This article provides a tour of the E-print collections which are federated. Hopefully, once you finish reading this article and this series, you will appreciate the innovation and hard work that has gone into producing the premier federated search application for searching E-prints.

The E-print Network can simultaneously search 52 databases plus the special harvest collection, discussed in Part 2, from a single query. That single search has the effect of searching approximately 4 million documents from the federated sources plus another 1.3 million documents from the harvested collection for a total of roughly 5.3 million documents. This search executes in real time. A user can select all databases to search, individual databases, categories of databases, or combinations of individual databases and categories. The databases are divided into eight categories:

  1. Biology
  2. Computer Technologies & Information Sciences
  3. Environmental Sciences and Ecology
  4. Institutional Repositories and Multidisciplinary Collections
  5. Mathematics
  6. Nonlinear Sciences
  7. Physics
  8. Renewable Energy

The relationship between categories and databases can be seen on the...

Related Topics: E-Print Network (EPN), federated search


Federated Search - The Wave of the Future?: Part 2

by Dr. Walt Warnick 13 Mar, 2008 in Technology

by Walt Warnick and Sol Lederman

This is the second in a three part series of articles about the deficiencies of web crawling and indexing, the superiority of federated search to the serious researcher, and the value of OSTI federated search applications in advancing science. Part 1 identified a number of serious limitations of Google and the other crawlers. This article shows how federated search overcomes these limitations. The final article in the series highlights a number of federated search applications and databases that OSTI makes available to the public.

In Part 1, we explained that Google, being a surface web crawler, cannot access the deep web, which consists of content that resides in databases. We also noted that the deep web is several hundred times larger than the surface web and that a large percent of the highly sought after scientific and technical information resides in the deep web. We also explained that there is no way to determine the quality of any particular document in the surface web. Any web citizen can post a document to the web and it will likely be indexed.

Federated search applications overcome the two aforementioned limitations of surface crawlers - (1) limited access to content, and (2) the difficulty in determining its quality. Limited access is overcome by the federated search engine's specialized knowledge of how to query a database and how to retrieve its documents. The quality concern is overcome by the complementary efforts of database owners and creators of federated search applications. First, databases that are made available to federated search applications are managed by owners, or organizations, who have criteria for...

Related Topics: doe, federated search, osti, web crawling