by Walt Warnick and Sol Lederman
This is the final article in a series about the limitations of the crawl and index approach to searching scientific content and the advantages of federated search. Part 1  identified a number of issues with Google and the other crawlers, and showed why researchers and the science attentive citizenry don't rely on "Googleable" content to meet their needs for quality scientific and technical information. Part 2  explained how federated search, by providing access to "non-Googleable" content, overcomes the Google limitations. This article highlights three important applications, developed and maintained by OSTI, that demonstrate how federated search is going beyond crawling to advance science.
OSTI makes R&D findings available and useful to DOE researchers and the American people. It serves this critical role under the auspices of the Office of Science, which is part of the U.S. Department of Energy. In December 2002 OSTI launched the first generation of Science.gov  to make U.S. government scientific and technical information available to the public. Science.gov pioneered the use of federated search within the federal government and has served as a model for a number of other applications, most notably Science Accelerator  and WorldWideScience.org . Together, these three applications serve as gateways to quality, authoritative scientific information.
Science.gov provides access to over 50 million pages of U.S. government science information and research results. The impressive volume of content is aggregated from the research output of 17 scientific and technical organizations of 13 federal agencies that participate in the Science.gov Alliance . Science.gov provides this impressive search capability through a simple search form, and also provides advanced search capabilities for those wanting to perform more refined searches. Sophisticated ranking of search results ensures that the most highly relevant results are presented first. In February 2005 an alert capability was introduced, allowing Science.gov users to create profiles of queries of interest; the application performs searches on the users' behalf and sends email notifications to users when their queries result in new documents being found.
In contrast to searches performed by Google, searches of Science.gov tend to return scholarly information , as opposed to lay information. OSTI capitalized on the success with Science.gov to create a number of other high profile federated search applications. In April 2007 OSTI introduced Science Accelerator , utilizing the proven Science.gov architecture. Science Accelerator searches nine key DOE R&D resources . What is particularly interesting about Science Accelerator is that a number of those resources are themselves federated search applications. Thus, Science Accelerator demonstrates the feasibility of building federated search applications hierarchically, where one searchable database is aggregated from multiple searchable databases, each of which can be decomposed further into searchable databases, and so on. This hierarchical construction will allow Science Accelerator to scale to search at least 1,000 databases in parallel in the foreseeable future. This will have the remarkable effect of enabling users to search all web-accessible collections of scientific knowledge related to the DOE mission from a single search form. A user alert capability for Science Accelerator is planned.
In June 2007, just two months after the launch of Science Accelerator, WorldWideScience.org  was opened to the public. While Science.gov and ScienceAccelerator were created to provide access to U.S. government and U.S. DOE documents, respectively, WorldWideScience.org introduced federated searching across science sources on a global scale. As a global science gateway, WorldWideScience.org currently allows searching of 29 databases from 43 cooperating countries. The databases offer more than 200 million pages, all of which are searched by each query of WorldWideScience.org in the default mode. The U.S. contribution to WorldWideScience.org is Science.gov. In some ways similar to the Science.gov Alliance, WorldWideScience.org relies upon the cooperation of its members, in this case, participating nations, to make their best research information available to the public.Additionally, by developing a federated search solution, Science.gov and WorldWideScience.org make it easy for database owners to join, as they need to do nothing but handle increased traffic. A user alert capability for WorldWideScience.org is planned.
What does the future hold for federated search, and for crawling? From the perspective of the serious researcher and science aware public, Google and the other crawlers can't provide access to the bulk of the scientific documents that are known to be of high quality . Most of this content in non-Googleable. Additionally, the deep web content that Science.gov, Science Accelerator, and WorldWideScience.org search can be updated instantaneously. Given the large utilization of the three applications discussed, OSTI has demonstrated that an effective approach to accelerating science is to build federated search applications that provide one-stop access to authoritative science information. Google can't separate the authoritative content from the non-useful content; this is by design.
We are aware that other architectures exist, whereby deep web content could be searched globally. Most, if not all, require information owners or some other party to tag every single document with metadata to enable precision searching of the owner's content. This approach places a major burden on the content owner which creates barriers to widespread adoption and threatens to slow the dissemination of critical science information. Because federated search has such key advantages over alternative search architectures, it is reasonable to expect that federated search applications will become more numerous, and federated search traffic will increase.
The model that Science.gov introduced, and that Science Accelerator and WorldWideScience.org have propagated, has proven itself. While as far as the eye can see, there will be a place for crawling and indexing, federated search may one day grow to the point that it becomes the dominant search architecture. As OSTI increases the number and scope of federated search applications it deploys and maintains, the limitations of Google will only become more apparent, the superiority of federated search will be more evident to more people, and federated search will help to advance science.
Walter L. Warnick, Ph.D.
Consultant to OSTI