Accelerating Science Discovery - Join the Discussion

OSTIblog Articles in the information Topic

A big anniversary for an even bigger collaboration!


A big anniversary for an even bigger collaboration!

Ten years ago this month was launched!  The cross-agency portal was created to break down the stovepipes of science information, knowing that it is difficult to know which federal agency holds what information.  Thanks to longtime relationships between the agency senior information managers of CENDI as well as a partnership with, and with the efforts of many, many supporters, a unique and grassroots project was undertaken and still provides an important service today.  A special thanks to our Alliance co-chairs during these years:  Eleanor Frierson, NAL/USDA (retired); Tom Lahr, NBII/USGS (retired); Cindy Etkin, GPO; Tina Gheen, LOC; Annie Simpson, USGS.

Some interesting facts:

  • Number of Websites, 2002 – 90
  • Number of Websites, 2012 – 2100+
  • Number of Large Databases, 2002 – 18
  • Number of Large Databases, 2012 – 59
  • Number of Pages Searched, 2002 – under 45 million
  • Number of Pages Searched, 2012 – 200 million+
  • Number of Participating Agencies, 2002 – 10
  • Number of Participating Agencies, 2012 – 13
  • Number of Page Views, Fiscal Year 2003 – 751,180
  • Number of Page Views, Fiscal Year 2012 – 34 million+

Here is to the next ten years! 

Related Topics: anniversary, CENDI, information, partnership, science,,


How to Integrate Anything on the Web

by Dr. Walt Warnick 03 Aug, 2011 in Technology


How to Integrate Anything on the Web

OSTI is especially proud of its web integration work whereby we take multiple web pages, documents, and web databases and make them appear to the user as if they were an integrated whole.   Once the sources are virtually integrated by OSTI, the virtual collection becomes searchable via a single query.  Because information on the web appears in a variety of formats, from HTML web pages, to PDF documents, to searchable databases, OSTI has developed and uses a suite of integration approaches to make them searchable via single query.  

OSTI has two goals that make it critical for us to understand multiple solutions for integrating science content on the web.  First, we make DOE science information widely available and searchable by appropriate audienceswherever they may be; and second, we make science information from around the world searchable by DOE researchers.  Since migrating to a fully electronic operationin the late 1990s, OSTI has met these goals by deploying various search architectures for integrating content via the web.

Within the information science circles that we engage in, we are well known for our pioneering work with the integration technology known as federated search. However, there are other, possibly lesser known,  technologies that we employ to integrate web content.

To  integrate information sources which are not interoperable, we see three categories of solutions:  1) you can create a data warehouse where you copy the information items, standardize metadata, and host them on your own servers;  2) you can create a discovery service wherein you index source items without copying them and then host the index on your server   (this technology is similar to that used by the major search engines except that you carefully direct the indexing tools, i.e., the crawler, so that only pre-selected material is indexed); and 3) you can use federated search to take advantage of existing search interfaces...

Related Topics: data warehouse, federated search, information, integration, r&d, science, scientific, technical