Discovery services have begun to appear in the search landscape.  Discovery services provide access to documents from publishers with which they have relationships by indexing the publishers’ metadata and/or full text. Discovery services are marketed to libraries where patrons appreciate near-instantaneous search results and where library staff is willing to restrict access to sources available from the service (and optionally the library's own holdings.)  While these services tout themselves as improvements to federated search, the reality is that there is no alternative to federated search for a number of important applications.

 

WorldWideScience.org is a global gateway to science. The federated search application was conceived and developed at OSTI and hosted by us. The portal performs live federated search of 70 databases from 66 countries. Participating members provide access to their national research databases. For a number of reasons this important gateway to millions of research documents does not lend itself to the discovery service model.

WorldWideScience.org content is free to the public.  Several difficult technical hurdles make it highly impractical to index content from member databases. The first challenge is that most databases will not provide a harvesting mechanism such as OAI-PMH. Without such a mechanism there is no method of predictably harvesting the entire contents of a database. From OSTI's perspective, it is not acceptable to provide access to only a subset of a scientific collection. Federated search completely avoids this problem by having the source's search engine query the entirety of its contents.

The second major challenge is that meta data does not exist for documents in many of the databases in WorldWideScience.org. Discovery services rely upon meta data to "homogenize" information about documents that they place in their unified indexes.

A third challenge is that WorldWideScience.org will soon be multi-lingual. While discovery services could pre-translate contents, doing that would be impractical as the volumes are so huge and constantly expanding.

A fourth challenge to indexing all of the content from WorldWideScience.org is that the science portal federates portals which themselves are federated search applications. These challenges make indexing and packaging the contents of WorldWideScience.org so expensive, difficult, and time consuming that no organization is likely to do it.

The onerous technical hurdles that would need to be overcome to make content such as that in WorldWideScience.org searchable by a discovery service illuminate the case for federated search. In the federated search model, content providers need only provide a search interface to their database, which they are already providing to their users. Ideally, the search interface is one that lends itself to machine search and retrieval. But even if it is not, in most cases, if a human can search it, a federated search application can be programmed to search it also. Also, federated search does not expect metadata. WorldWideScience.org serves its content owners by eliminating all barriers to participation. Even language translation is not a burden to the database owners. If the member nations sanction a particular database then the burden of inclusion of that database is taken on solely by the vendor that developed and maintains the federated search engine, Deep Web Technologies.

Another advantage of federated search is that applications can be easily integrated with other applications.  For example, ScienceResearch.com provides access to a mix of proprietary and open content, such as WorldWideScience. Through our federated search approach, the WorldWideScience.org Alliance maintains autonomy while extending the reach of its materials. Best of all, we do all of this without burdening anyone. In this way we advance our mission of accelerating science.

But don’t take us wrong.  We at OSTI would welcome a discovery service which seeks to make DOE material more accessible.  OSTI systems are already set up to facilitate such a collaboration.  However, the technology of discovery services is less suitable for certain important purposes, like WorldWideScience.org, now fulfilled by federated search.

Walt Warnick
OSTI Director

Sol Lederman
OSTI Consultant

 

read more...

Reply

Comments policy

We welcome your comments and you submission of web links to the OSTIblog and look forward to civil discourse on a variety of science and technology information topics. We will review comments before posting and we reserve the right to not post comments.

We prefer comments and links that are specific to the subject of the OSTIblog post.

You are fully responsible for everything that you submit, and all posted comments are in the public domain. This means that your comments could be distributed widely.

You may comment anonymously. Your name, website, and email are not required.

By selecting the preview button, submit button, and/or by submitting anti-spam answers, you accept these terms and conditions.

(If you're a human, don't change the following field)
Your first name.
The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.