by Mark Martin on Thu, November 05, 2009
As with most things, all federated search products are not created equally. Recently, I ran across a situation where federated search was derided for lack of capability related to precision search and relevancy ranking. As is often the case, this derision is founded in a narrow view of federated search. The view that federated search is only capable of generically searching data stores or not providing relevance across the resources being searched is this narrow view of what the technology can achieve. At OSTI we see these issues as the challenges that federated search faces, not the reality it must operate in. Recently, I pointed out that OSTI has been on the forefront of the development of federated search for over a decade. During that time, working in close partnership with Deep Web Technologies, we have made significant advances in our federated search technology to combat the issues of the narrow view.
Precision search has been a focus for the development of OSTI’s federated search technology dating back to the original EnergyPortal Search deployment in 1999. As an information acquisition and dissemination organization, precision search is of paramount importance to OSTI. This requirement has led to the federated search technology deployed here at OSTI having the capability to provide fielded advanced search. Science.gov, launched in 2002, provides the ability to not only search the entire record, but also to do a fielded search against titles, authors, and publication dates. This feature is accomplished by developing very sophisticated data source connection technology that takes into account the unique search interfaces provided by each individual source in Science.gov.
Much like precision search, relevancy ranking is critical to our mission here at OSTI. As such, we focus heavily on the technical implementation of relevancy ranking in our products. We not only provide this capability in our database driven products, where it is a more trivial task, but also in our federated search products, where the challenge is greater as the ranking has to be performed in real time. This difficulty can be overcome though. We have accomplished just that in Science.gov, which provides relevance ranking across all data returned from the underlying data sources in a single coherent result set. Take a look for yourself. Check out the featured search link; as I write this it is “carbon capture storage.” After initiating this search, Science.gov connects to all the underlying data resources; retrieves the results from those resources; creates and displays an initial result set for the resources that return very quickly; prompts the user to include all the results after completing the search against all data sources; re-ranks for relevance if those results are included; and yields a single coherent relevancy-ranked result set. You can find evidence for this by viewing the articles themselves and noting that data from different sources is scattered throughout the result set.
While not included in the original criticism that started this train of thought, the total number of sources searched by federated products is also something that OSTI has spent considerable effort to increase. The challenge of searching multiple sources of data in a timely fashion has been around since we first jumped into the federated search pool. A decade ago we worried about how we were going to search five sources and get data back to the user before they gave up. Needless to say, we were successful in getting past that barrier. Today, WorldWideScience.org searches over sixty databases in real-time. Tomorrow, we hope to push the limits of the technology even further, scaling to thousands of data resources searched in real time.
Broad statements applied to technology are typically not very accurate. Many times, people make broad statements, both positive and negative, about an entire field that just do not apply to the field as a whole. At OSTI, we push the boundaries and try to expand the art of the possible. I think we have been most successful achieving these ideals relative to our federated search technology, but don’t just take my word for it. Government Computer News recently cited Science.gov as one of ten sites taking online government to the next level.