OSTI is especially proud of its web integration work whereby we take multiple web pages, documents, and web databases and make them appear to the user as if they were an integrated whole. Once the sources are virtually integrated by OSTI, the virtual collection becomes searchable via a single query. Because information on the web appears in a variety of formats, from HTML web pages, to PDF documents, to searchable databases, OSTI has developed and uses a suite of integration approaches to make them searchable via single query.
OSTI has two goals that make it critical for us to understand multiple solutions for integrating science content on the web. First, we make DOE science information widely available and searchable by appropriate audienceswherever they may be; and second, we make science information from around the world searchable by DOE researchers. Since migrating to a fully electronic operationin the late 1990s, OSTI has met these goals by deploying various search architectures for integrating content via the web.
Within the information science circles that we engage in, we are well known for our pioneering work with the integration technology known as federated search. However, there are other, possibly lesser known, technologies that we employ to integrate web content.
To integrate information sources which are not interoperable, we see three categories of solutions: 1) you can create a data warehouse where you copy the information items, standardize metadata, and host them on your own servers; 2) you can create a discovery service wherein you index source items without copying them and then host the index on your server (this technology is similar to that used by the major search engines except that you carefully direct the indexing tools, i.e., the crawler, so that only pre-selected material is indexed); and 3) you can use federated search to take advantage of existing search interfaces...Read more...