Accelerating Science Discovery - Join the Discussion

OSTIblog Articles in the federated search Topic

Refreshed National Library of Energy(Beta) Takes on Expanded Role in Disseminating Department of Energy Scientific and Technical Information

by Lynn Davis 29 May, 2014 in

The National Library of EnergyBeta (NLEBeta), a gateway to information across the U.S. Department of Energy (DOE), is taking on an expanded role in providing access to DOE scientific and technical information (STI) with the retirement of the federated search product Science Accelerator.    In addition, the NLEBeta, launched in February 2013, has a redesigned home page and new features that makes it easier to use than ever. 

Developed by DOE’s Office of Scientific and Technical Information (OSTI), the NLEBeta search tool makes it easy for American citizens to find and access information about the Department from across the DOE complex nationwide, without knowing DOE’s organizational structure.    

The NLEBeta integrates and makes searchable disparate and decentralized information collections across DOE.  Users can search hundreds of webpages and 18 databases – a total of 25 million pages – hosted by DOE (; all DOE program offices; the National Nuclear Security Administration; the Energy Information Administration; all DOE staff offices; all DOE field/site offices; and all DOE National Laboratories and technology centers.  DOE’s program offices include Advanced Research Projects Agency-Energy (ARPA-E), Energy Efficiency and Renewable Energy, Environmental Management, Fossil Energy, Nuclear Energy, and Science.

The NLEBeta makes it possible to search all this information via a single search box.  Using federated search and indexing technology, the NLEBeta retrieves relevance-ranked individual site results with links to the sites or databases where the original content can be viewed.


Related Topics: Advanced Research Projects Agency-Energy, DOE field offices, DOE staff offices, Energy Information Administration, federated search, national laboratories, National Library of Energy (NLE) - Beta, National Library of Energy (NLE) - Beta, National Nuclear Security Administration, Office of Energy Efficiency and Renewable Energy, Office of Environmental Management, Office of Fossil Energy, Office of Nuclear Energy, Office of Science


How to Integrate Anything on the Web

by Dr. Walt Warnick 03 Aug, 2011 in Technology


How to Integrate Anything on the Web

OSTI is especially proud of its web integration work whereby we take multiple web pages, documents, and web databases and make them appear to the user as if they were an integrated whole.   Once the sources are virtually integrated by OSTI, the virtual collection becomes searchable via a single query.  Because information on the web appears in a variety of formats, from HTML web pages, to PDF documents, to searchable databases, OSTI has developed and uses a suite of integration approaches to make them searchable via single query.  

OSTI has two goals that make it critical for us to understand multiple solutions for integrating science content on the web.  First, we make DOE science information widely available and searchable by appropriate audienceswherever they may be; and second, we make science information from around the world searchable by DOE researchers.  Since migrating to a fully electronic operationin the late 1990s, OSTI has met these goals by deploying various search architectures for integrating content via the web.

Within the information science circles that we engage in, we are well known for our pioneering work with the integration technology known as federated search. However, there are other, possibly lesser known,  technologies that we employ to integrate web content.

To  integrate information sources which are not interoperable, we see three categories of solutions:  1) you can create a data warehouse where you copy the information items, standardize metadata, and host them on your own servers;  2) you can create a discovery service wherein you index source items without copying them and then host the index on your server   (this technology is similar to that used by the major search engines except that you carefully direct the indexing tools, i.e., the crawler, so that only pre-selected material is indexed); and 3) you can use federated search to take advantage of existing search interfaces...

Related Topics: data warehouse, federated search, information, integration, r&d, science, scientific, technical


The Importance of Small Business Innovation Research Funding

by Dr. Walt Warnick 09 Mar, 2011 in Technology


The Importance of Small Business Innovation Research Funding

The Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) programs were established to provide funding to stimulate technological innovation in small businesses to meet federal agency research and development needs.  Under SBIR, federal agencies with large R&D budgets set aside a small fraction of their funding for competitions exclusively among small businesses.  Each year, the DOE Office of Science sets aside 2.8% of its research budget for SBIR (2.5%) and STTR (.3%) awards.  Small businesses that win SBIR awards keep the rights to any technology developed and are encouraged to commercialize the technology.


Established in 1947, the DOE Office of Scientific and Technical Information (OSTI) fulfills the agency’s responsibilities to collect, preserve and disseminate scientific and technical information (STI) emanating from DOE R&D activities.  OSTI’s mission is to advance science and sustain creativity by making R&D findings available to DOE and other researchers and the public.  OSTI is founded on the principle that science progresses only if knowledge is shared; furthermore, OSTI is animated by the concept, now widely accepted, that accelerating the sharing of knowledge accelerates the advancement of science.  SBIR projects have been integral to OSTI’s success in speeding access to scientific knowledge to speed discovery, innovation and economic progress.


Since 2003, the Office of Science’s SBIR office has had a policy of funding knowledge technology SBIR projects under OSTI guidance that have produced technologies that today significantly benefit SC, DOE and science community researchers across the county – and around the world.  OSTI-managed SBIR projects have enabled OSTI to promote essential ongoing innovation in its products and services, which have enhanced its performance of its statutory mandate.  (Please see below for a list of technologies developed by OSTI-managed SBIR...

Related Topics: federated search, multilingual, r&d, relevance ranking, SBIR, sttr, translations


Enormous STI Content Made Easily Searchable by OSTI


Enormous STI Content Made Easily Searchable by OSTI

We have integrated about ten OSTI products dealing with technical reports, e-prints, patents, conference proceedings, project summaries, etc., so that they are all searchable via s single query.  The integrated product allows users to search without first having to decide which OSTI product is likely to have the content he/she seeks.  This product is

We have integrated comparable offerings from about 14 other agencies so that all the virtually combined offerings can be searched via a single query. allows users to search without first having to decide which agency offers which content.  The DOE contribution to is .

We have integrated comparable offerings from about 70 other countries so that all the offerings can be searched via a single query.  The US contribution to is allows users to search without first having to decide which country offers which content.  The virtual collection is enormous, being comparable in size to science made searchable via Google.  Our tests suggest, however, that well over 90% of the content of WorldWideScience is non-Googelable.

Until June 11, 2010, the content accessible via WorldWideScience had English titles and other bibliographic information.  On June 11, 2010 WorldWideScience became multilingual.  A beta application was launched which enables speakers of English to search databases posted on behalf of the Russian government for speakers of Russian.  Similarly, for Chinese and seven other languages.  And speakers of these other languages can search the English offerings of WorldWideScience.  The translation capabilities are provided by a collaboration with Microsoft.

Microsoft has posted a blog about Multilingual WWS by Tony Hey, their...

Related Topics: federated search, Science Accelerator,, (WWS)


No Alternative to Federated Search

by Dr. Walt Warnick 09 May, 2010 in Technology

Discovery services have begun to appear in the search landscape.  Discovery services provide access to documents from publishers with which they have relationships by indexing the publishers’ metadata and/or full text. Discovery services are marketed to libraries where patrons appreciate near-instantaneous search results and where library staff is willing to restrict access to sources available from the service (and optionally the library's own holdings.)  While these services tout themselves as improvements to federated search, the reality is that there is no alternative to federated search for a number of important applications. is a global gateway to science. The federated search application was conceived and developed at OSTI and hosted by us. The portal performs live federated search of 70 databases from 66 countries. Participating members provide access to their national research databases. For a number of reasons this important gateway to millions of research documents does not lend itself to the discovery service model. content is free to the public.  Several difficult technical hurdles make it highly impractical to index content from member databases. The first challenge is that most databases will not provide a harvesting mechanism such as OAI-PMH. Without such a mechanism there is no method of predictably harvesting the entire contents of a database. From OSTI's perspective, it is not acceptable to provide access to only a subset of a scientific collection. Federated search completely avoids this problem by having the source's search engine query the entirety of its contents.

The second major challenge is that meta data does not exist for documents in many of the databases in Discovery services rely upon meta data to "homogenize" information about documents that they place in their...

Related Topics: federated search, (WWS)


Federated Search: Closing in on the Speed Gap

by Dr. Walt Warnick 02 Apr, 2010 in Technology


Federated Search: Closing in on the Speed Gap

Many casual users of federated search criticize the technology for being slow to retrieve results. Serious researchers recognize the unique ability of federated search engines to mine the deep Web for quality science information that Google cannot find. These users recognize that there is no practical alternative to federated search for the best information. Still, everyone wants everything faster, and those users who are willing to trade quality for quickness focus on how federated search doesn't return results in "Google time."

OSTI begins to address the speed issue by displaying some results as soon as they are available. However, this approach causes results to be delivered in two sequential sets, which many users find less than ideal.The good news for federated search users is that speed is not an insurmountable issue because technology is closing in on the speed gap. The major bottlenecks to lightning fast federated search performance are related to networks, applications, server hardware, and storage. A systematic program to increase the speed of federated search would begin with a much needed serious assessment of the relative size of the bottlenecks. Lacking such an assessment, we consider how each bottleneck can be mitigated. The good news is that the inexorable advance of technology is steadily speeding up federated search.Networks move search result metadata and document full text to searchers. Network latency and overall network speed directly impact response times for searching. Network bottlenecks can appear anywhere in the route from the content provider's search engine to the user's browser. Fortunately, networks are getting faster. Network giant Cisco Systems just announced its new CRS-3 Carrier Routing System. Cisco boasts that the new system can download the entire printed collection of the Library of Congress in just over one second, stream every motion picture ever created in less than four minutes, and allow every man, woman, and...

Related Topics: federated search


Federated Search: An Often Overlooked Example of e-Science

A term of art now catching on is “e-Science.”  According to Wikipedia, “The term e-Science (or eScience) is used to describe computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. Examples of the kind of science include social simulations, particle physics, earth sciences and bio-informatics.”

Our “federated search”  is not what most folks mean by e-Science, but “federated search” nevertheless fits the definition.  Indeed, it seems quite reasonable to think of federated search as the text-based manifestation of e-Science.

As the Wikipedia article notes, the technology that enables e-Science is “grid computing.”  Back several years ago, the DOE Office of Advanced Scientific Computing Research had a Small Business Innovation Research...

Related Topics: e-science, federated search


The X-Portal Vision of the Future

by Peter Lincoln 05 Oct, 2009 in Technology

As OSTI Director Walt Warnick likes to say, today's Web is like the Model T Ford -- revolutionary but ready for vast improvement. This is especially true when it comes to making the Web work for science and technology. In that spirit I want to describe a new kind of Web Portal, one which has yet to be built. It is called the X-Portal.

An X-Portal provides comprehensive coverage for a specific science or technology community, where X refers to that community. In other words, an X-Portal for biofuels is a comprehensive biofuels portal. X = Neutron Science gives a comprehensive neutron science portal, and so on. There can be as many X-Portals as there are communities, but each has a similar design.

The need for X-Portals

The need for X-Portals is based on the fact that today's search engines and portals typically provide less than 5% coverage of any given science community. Today's Web portals and search engines, while revolutionary, are technologically immature and far from comprehensive. As a result they do relatively little to overcome the cognitive barrier of findability. One can usually find something relevant, but it is seldom the best thing out there. With 5% coverage the odds are 19 to 1 against finding the best accessible content. Moreover, if the coverage extends to a large number of other communities, as with Google, even that 5% may be swamped by hits on other communities.

There are two principal reasons for these deficiencies. First of all today's portals try to be too broad, so they wind up being shallow. This means they only capture a small fragment of any given technical community. Second, because they are so broad they cannot make use of the emerging technologies of federation, semantic analysis, mapping and visualization. These new technologies require a certain amount of analytical effort that is specific to each community. When the content is too broad these technologies are prohibitively difficult to apply.


Related Topics: federated search,, (WWS), x-portal


OSTI's Pioneering Technology Efforts

by Mark Martin 24 Jul, 2009 in Technology

A typical misconception I face when I tell people that I work within the government is that they think my job, even though it is in the technology arena, must move at a snail's pace relative to the commercial sector. This preconceived notion that our government crawls along relative to technology adoption and innovation - at least in my experience - is way off the mark.

Here at OSTI we can cite several examples where we have been on the bleeding edge of technological development. Not only have we been on the bleeding edge, in some cases we have been on that bleeding edge in cooperation with some of the largest, most innovative technology companies in the world.

For example, OSTI has been a pioneering force in federated search technology since the late 1990s. Federated search, for those of you new to the term, is the simultaneous search of multiple online databases or web resources from a single query. The Wikipedia article on federated search is an excellent resource for more information on exactly how federated search works.

Before the term "federated search" had been coined, OSTI was implementing pioneering technology that would come to be known as federated search. In April 1999, OSTI launched EnergyPortal Search, a product now encompassed in EnergyFiles.  EnergyPortal Search was the first federated search application deployed by OSTI and the first product of its kind in the government.  In December 2002, OSTI launched, the first ever search capability across major science agencies. In June 2007, OSTI introduced the concept of, which searches across national and international...

Related Topics: Energy Files, federated search, milestones,, (WWS)


Aspirations for Connecting Researchers in New Media

For several years I've been responsible for organizing OSTI staff to capitalize the benefits of web and mobile web innovations.  An important endeavor of mine aspires to help OSTI become a leader in connecting scientists in the second generation of the WorldWideWeb - Web 2.0.  Connecting scientists supports our director's vision of Global Science Discovery (More on this vision later.)  Web 2.0 has enabled new types of media that are capable of accomplishing his ideals for knowledge diffusion, increasing contact rates between scientists, and accelerating science.  After years of grassroots research I assembled OSTI's Web 2.0 Team to seed new Web innovation and exchange Web 2 accomplishments.   As we progress in the coming months, I hope to incite my Teammates and others to share more Web 2.0 accomplishments on the OSTIblog.

Outside of science, the Web already accelerates commerce, entertainment, social issues, and politics.  In theory, new Web 2 media spaces such as Twitter, LinkedIn, Youtube, Facebook, Google, Blogger, Wordpress, Flickr, Feedburner, etc. have useful features for attracting and connecting thousands of science work groups.  A key factor is that these new sites make services and content available on Web-enabled devices like cellphones, iPods, and eBooks.  This combination of hardware and web software can help researcher's core information needs and practices - finding and monitoring science information, directing staff, and circulating information with peers and officials. So, it's not a huge leap to see the possibilities of new media connecting...

Related Topics: doe research, federated search, new media, web 2.0