by Sol Lederman on Tue, February 19, 2008
Federated search is very much at the heart of OSTI's ability to realize its mission. OSTI provides a simple description of what federated search is and how it works in the OSTI environment. The best way to experience the tremendous value of federated search at OSTI is to try several of OSTI's flagship applications:
These, and all, federated search applications search databases "live", which means there is no delay or "lag time" between when a collection is updated by its owner and when the new content can be searched. Science Accelerator provides searchable access to a number of science databases that OSTI manages. Its aim is to accelerate science discovery by greatly reducing the time and effort required for researchers to find relevant science information. Science.gov was OSTI's break-through federated search product; the first version was launched in December 2002. Science.gov provides access to more than 50 million pages of science information from 17 scientific and technical organizations via the collaboration of 13 federal agencies. WorldWideScience is a global science gateway to national and international scientific databases.
The technology used to mine content from the deep web is called "federated search." While federated search is not the only search technology OSTI employs, it serves the critical role of facilitating the diffusion of knowledge and accelerating science.
OSTI stewards a tremendously large number of scientific and technical documents, in numerous collections, on behalf of the US Department of Energy. The vast majority of this high quality content is not searchable from Google, Yahoo!, or the other popular search engines. That is because most of the content that OSTI makes available to scientists, researchers, and the public lives in the "deep web" or the "invisible web." It takes specialized search engines to search this Deep web content. Strictly speaking, federated search refers to querying multiple document databases simultaneously, not to searching the deep web. In practice, however, most federated search engines specialize in accessing deep web content although they may aggregate, or federate, content from sources other than the deep web as well.
Federated search technology is of particular strategic value to OSTI in that it does not place any requirements or burdens on owners of databases. This means that when an agreement is made with a scientific organization to make its content searchable by one or more OSTI applications, setting up access to the organization's content is a rapid and straightforward process. If the organization's content is already searchable, via the web or some other mechanism, then the organization has no responsibility other than to keep its database accessible, a responsibility it already has.
Another great value of federated search is that databases can be aggregated into federations of federations. This means that a federated search application can act as a single database to another federated search application. As an example, the federated search application WorldWideScience searches a number of scientific databases. One of these is Science.gov, itself a federated search application. Having layers of federation provides two tremendous benefits to OSTI. First, it greatly extends the reach of a single application from several dozen databases to literally hundreds of databases in real time. This ability to scale is critical to OSTI's drive to accelerating the diffusion of science. Second, multi-layered federation allows for managing collections of content databases in a decentralized way. While it would be too onerous a task for a single organization to manage the availability of hundreds of databases, it is quite manageable for several organizations to each manage access to smaller sets of databases and to provide access to the databases they steward through a federated search application which they then provide as "feeds" to the larger application.
In my five years of involvement with OSTI, supporting their federated search products in various roles, I never cease to be amazed at the dedication, hard work, innovation, and vision that goes into facilitating access to science information to serve both science and the public.
Sol Lederman Consultant to OSTI