by Dr. Walt Warnick on Mon, May 10, 2010
Discovery services have begun to appear in the search landscape. Discovery services provide access to documents from publishers with which they have relationships by indexing the publishers’ metadata and/or full text. Discovery services are marketed to libraries where patrons appreciate near-instantaneous search results and where library staff is willing to restrict access to sources available from the service (and optionally the library's own holdings.) While these services tout themselves as improvements to federated search, the reality is that there is no alternative to federated search for a number of important applications.
WorldWideScience.org is a global gateway to science. The federated search application was conceived and developed at OSTI and hosted by us. The portal performs live federated search of 70 databases from 66 countries. Participating members provide access to their national research databases. For a number of reasons this important gateway to millions of research documents does not lend itself to the discovery service model.
WorldWideScience.org content is free to the public. Several difficult technical hurdles make it highly impractical to index content from member databases. The first challenge is that most databases will not provide a harvesting mechanism such as OAI-PMH. Without such a mechanism there is no method of predictably harvesting the entire contents of a database. From OSTI's perspective, it is not acceptable to provide access to only a subset of a scientific collection. Federated search completely avoids this problem by having the source's search engine query the entirety of its contents.
The second major challenge is that meta data does not exist for documents in many of the databases in WorldWideScience.org. Discovery services rely upon meta data to "homogenize" information about documents that they place in their unified indexes.
A third challenge is that WorldWideScience.org will soon be multi-lingual. While discovery services could pre-translate contents, doing that would be impractical as the volumes are so huge and constantly expanding.
A fourth challenge to indexing all of the content from WorldWideScience.org is that the science portal federates portals which themselves are federated search applications. These challenges make indexing and packaging the contents of WorldWideScience.org so expensive, difficult, and time consuming that no organization is likely to do it.
The onerous technical hurdles that would need to be overcome to make content such as that in WorldWideScience.org searchable by a discovery service illuminate the case for federated search. In the federated search model, content providers need only provide a search interface to their database, which they are already providing to their users. Ideally, the search interface is one that lends itself to machine search and retrieval. But even if it is not, in most cases, if a human can search it, a federated search application can be programmed to search it also. Also, federated search does not expect metadata. WorldWideScience.org serves its content owners by eliminating all barriers to participation. Even language translation is not a burden to the database owners. If the member nations sanction a particular database then the burden of inclusion of that database is taken on solely by the vendor that developed and maintains the federated search engine, Deep Web Technologies.
Another advantage of federated search is that applications can be easily integrated with other applications. For example, ScienceResearch.com provides access to a mix of proprietary and open content, such as WorldWideScience. Through our federated search approach, the WorldWideScience.org Alliance maintains autonomy while extending the reach of its materials. Best of all, we do all of this without burdening anyone. In this way we advance our mission of accelerating science.
But don’t take us wrong. We at OSTI would welcome a discovery service which seeks to make DOE material more accessible. OSTI systems are already set up to facilitate such a collaboration. However, the technology of discovery services is less suitable for certain important purposes, like WorldWideScience.org, now fulfilled by federated search.