Accelerating Science Discovery - Join the Discussion

OSTIblog Articles in the E-Print Network (EPN) Topic

OSTI by the numbers

by Tim Byrne 02 Nov, 2012 in Products and Content

For those of you who like numbers, I thought I would give you a few numbers about some of OSTI’s databases and search products. 

  • The DOE Information Bridge now has over 300,000 full-text STI reports. While most of these are post 1991, there are over 84,000 reports published prior to 1990.
  • The Energy Citations Database contains over 2.4 million citations and they are not just technical reports. ECD has over 1.4 million journal articles.
  • DOepatents has over 27,000 patents resulting from DOE-sponsored research and development.
  • The E-Print Network searches over 5.5 million e-prints, over 35,000 websites, over 3100 scholarly societies, and over 50 databases.
  • The Energy Science and Technology Software Center (ESTSC) distributes over 1300 software packages.
  • ScienceCinema has over 2600 science and technology related videos for your viewing pleasure.
  • searches 55 databases from 13 federal agencies.
  • lets you search 83 collections from over 70 countries in 9 different languages.

Related Topics: DOepatents, E-Print Network (EPN), Energy Citations Database (ECD), Energy Science and Technology Software Center (ESTSC), Information Bridge (IB),, ScienceCinema, (WWS)


Get scientific e-prints


Get scientific e-prints

The E-print Network provides a vast, integrated network of electronic scientific and technical information created by scientists and research engineers active in their respective fields, all full-text searchable.  Documents such as these are the means by which today’s scientists and researchers communicate their recent findings to their colleagues and by which they propose new ideas of how the world works to their peers for their collective judgment.  Documents such as these then are of the sort that becomes the central body of scientific information.  While the E-print Network is intended for use by scientists, engineers, and students at advanced levels, it is freely available for all users.

The gateway provides access to over 35,000 websites and numerous research databases worldwide containing over 5.5 million e-prints in basic and applied sciences in areas such as physics, computer and information technologies, biology and life sciences, environmental sciences, materials science, chemistry, nuclear sciences and engineering, energy research, and other disciplines of interest to DOE.

Related Topics: colleagues, documents, E-Print Network (EPN), e-prints, full text, physics, researchers, science, scientists, searchable


Making Scientific Databases Work Together—For You (psst . . . that's "search interoperability")

by Dr. Walt Warnick 13 Feb, 2012 in Technology


Making Scientific Databases Work Together—For You (psst . . . that's "search interoperability")

Sometimes something complex can work so seamlessly that it’s easy to miss. We think that’s the case with our solution in achieving search interoperability.

As you may know, “search interoperability” is just a fancy way of saying that lots of scientific databases scattered far and wide can be made to work together so that your job as a seeker of science information is easy. You can go to one search box, say, type in your search term, and get results from over a hundred important repositories and a couple of thousand scientific websites – with one click.

And you know that this is a good thing, because as a practical matter, you cannot be expected to conjure in advance which database might hold the information you seek. Nor can you be expected to search dozens of sources one-by-one.

That would be an onerous task. Also, as an experienced seeker of quality science information, you are well aware that commercial search engines (read, Google, Bing, etc.) sometimes cannot mine the deep web for you, thus missing R&D results residing there (see Federated Search - The Wave of the Future?).

So achieving search interoperability with OSTI’s federated search tools, such as,, and the E-print Network, has been an important development, though by no means easily accomplished. There are myriad obstacles that can block information exchanges between systems.  (To learn more about the broad topic of interoperability and obstacles to exchanging information, see the Wikipedia article on interoperability.)

Specific to our world of scientific and technical information, the challenge of interoperability basically stems from the simple fact...

Related Topics: databases, E-Print Network (EPN), federation, interoperability, science,, (WWS)


The Science Knowledge Imperative: Making non-Googleable Science Findable

Just as science progresses only if knowledge is shared, accelerating the sharing of knowledge accelerates science. All of us engaged in disseminating science knowledge have the opportunity and obligation to do our jobs better, for to do so accelerates science itself. 

To this end, I propose a grand challenge--to make more science available to, and searchable by, more people than ever before. A momentous milestone will be achieved once we enable everyone with web access the ability to search with unparalleled precision a billion pages of authoritative science. Already, considerable progress has been made.  

My organization, the U.S. Department of Energy (DOE) Office of Scientific and Technical Information (OSTI) is responsible for the scientific and technical information operations of the Department. Over the last 11 years, OSTI has become entirely web-based. Of course, we are just one among many entities who connect people to knowledge using the web. Most notably, Google, Yahoo!, and other conventional search engine providers do this, too.

Google and other conventional search engines do for the web what publishers have long done for books--they create an index so that customers can quickly find information. Web users value this service so highly that search companies have become phenomenally successful enterprises. 

However, an important misunderstanding has sprung up about Google and the others. That is, the false presumption, especially among young people, that most useful information is available via conventional search engines such as Google and Yahoo!

In fact, much of the information on the web is inherently unavailable to Google and Yahoo! This key limitation would come as a surprise to...

Related Topics: crawling, E-Print Network (EPN), federated search, google, science,, (WWS)


Forms of STI - pt. 3

by Tim Byrne 23 Jun, 2008 in Products and Content

In the first two parts to this post (Forms of STI and Forms of STI - pt. 2), I talked about how there are different forms of scientific and technical information and each is published and disseminated in its own way.  OSTI has different search tools to access the different types of STI.  I also discussed technical reports, journal literature, conference proceedings and papers, and e-prints.  After defining each of these types of STI, I described the OSTI products that searches each.  This post will finish the discussion by covering patents, project summaries, and theses/dissertations.


 Patents allow the spread of information about technological inventions while protecting the property rights of the inventor.  A patent issued by the U.S. Patent and Trademark Office excludes others from making, using, offering for sale, or selling the invention throughout the U.S. or importing the invention into the U.S. for a limited time in exchange for public disclosure of the invention when the patent is granted.  This public disclosure is extremely important in furthering scientific research.  Technology moves on, but information remains useful forever

Thomas Jefferson, an inventor himself and appointed by George Washington to the first Patent Board, was, essentially, the first patent examiner.  He found that "the issue of patents for new discoveries has given a spring to invention beyond my conception." (As a graduate of the University of Virginia, I always like to work in a Jefferson quote in my writings.)

DOE and its predecessor agencies, ERDA and AEC, are responsible for creating a tremendous amount of new technology....

Related Topics: dissertations, DOE Research & Development (R&D) Accomplishments, DOE Research and Development (R&D) Project Summaries, E-Print Network (EPN), Energy Citations Database (ECD), Energy Files, Federal R&D Project Summaries, Information Bridge (IB), osti, patents, project summaries, sti, theses


Forms of STI - pt. 2

by Tim Byrne 20 Jun, 2008 in Products and Content

In the first part to this post, Forms of STI,  I talked about how there are different forms of scientific and technical information and how each is published and disseminated in its own way.  OSTI has different search tools to access the different types of STI.  In the last post I discussed technical reports.  Now I will cover journal literature, conference proceedings and papers, and e-prints, defining each and pointing out the OSTI  search tools that covers each.

Journal literature:

The publication of research in scientific journals started in the mid seventeenth century.  Before that and for some time after, scientific and technical information was circulated via letters, printed tracts and books.  Journals became a preferred medium because journal publishers worked to achieve wider dissemination and faster publication.  Today, however, even with the tremendous growth in scientific journals in the later half of the twentieth century, publishing in scientific journals is most often not a speedy process.  It can often take a year of more for an article to be published once it has been accepted by a journal.  For this reason, many scientists and engineers also utilize other means to share their research.  Options include technical reports, conference papers, pre-prints and a growing use of e-prints.

From 1948 to 1976, the Atomic Energy Commission published Nuclear Science Abstracts, providing comprehensive indexing of the international nuclear science literature, including journal literature on a worldwide basis.  This literature can now be found using Energy Citations Database.  ECD...

Related Topics: conference papers, conference proceedings, E-Print Network (EPN), e-prints, Energy Citations Database (ECD), Energy Files, Information Bridge (IB), journal articles, osti, Science Conference Proceedings,, sti


Forms of STI

by Tim Byrne 19 Jun, 2008 in Products and Content

A comment I have heard on numerous occasions is that OSTI has a too many databases and search tools and it is difficult to know which to use.  Well, I am sure that a lot of people do find the variety of OSTI resources to be a bit confusing, but it really takes different types of databases and search tools to cover all the different types of scientific and technical information (STI).  Scientific and technical information has many forms, such as journal articles, technical reports, patents and e-prints.  Each has its own publication route which requires its own method of acquisition.

A traditional library is built by compiling a collection of books and periodicals for use by library patrons.  In the electronic world, collections have expanded beyond the walls of the library.  OSTI is able to create two different types of electronic collections.  The first type is more like a traditional library in that OSTI compiles a collection of STI produced by or funded under the provenance of the Department of Energy on an OSTI computer.  OSTI controls what goes into these collections and in what format.  The OSTI databases that are of this sort include the full text documents in the Information Bridge and the bibliographic citations and summaries created for the Energy Citations Database, DOEpatents, and the DOE R&D Project Summaries.  The second type of electronic collection is a virtual collection of STI outside of DOE.  These collections contain STI that is of interest to DOE, but, for the most part, is not produced by DOE.  The citations and full text documents in these virtual collections reside on the Internet in servers all over the world.  OSTI has identified the locations of the STI and provides a means to search...

Related Topics: conference papers, conferences, DOE Research and Development (R&D) Project Summaries, DOepatents, E-Print Network (EPN), e-prints, Energy Citations Database (ECD), Information Bridge (IB), journal articles, patents, proceedings, project summaries, Science Accelerator, Science Conference Proceedings,, sti, technical reports, theses, (WWS)


Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 3

by Sol Lederman 21 Mar, 2008 in Technology

This is the third, and final, article in a series. The first article provided an overview of the E-print Network. The second article discussed the special harvested component of the E-print Network in depth. This article provides a tour of the E-print collections which are federated. Hopefully, once you finish reading this article and this series, you will appreciate the innovation and hard work that has gone into producing the premier federated search application for searching E-prints.

The E-print Network can simultaneously search 52 databases plus the special harvest collection, discussed in Part 2, from a single query. That single search has the effect of searching approximately 4 million documents from the federated sources plus another 1.3 million documents from the harvested collection for a total of roughly 5.3 million documents. This search executes in real time. A user can select all databases to search, individual databases, categories of databases, or combinations of individual databases and categories. The databases are divided into eight categories:

  1. Biology
  2. Computer Technologies & Information Sciences
  3. Environmental Sciences and Ecology
  4. Institutional Repositories and Multidisciplinary Collections
  5. Mathematics
  6. Nonlinear Sciences
  7. Physics
  8. Renewable Energy

The relationship between categories and databases can be seen on the...

Related Topics: E-Print Network (EPN), federated search


Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 2

by Sol Lederman 05 Mar, 2008 in Technology

In Part 1 of this series I provided an overview of the technology that drives the E-print Network. In this article I will provide some detail about how the harvested collection, the "E-prints on Web Sites" component of the E-print Network, is constructed. In Part 3, I will discuss the technology of the portion of the E-print Network that relies on federated search of databases.

In Part 1 I explained that the E-print Network combines federated sources searched in real-time with harvested content. The harvested content, consisting of over 1.3 million e-prints, is found by directing a crawler to 28,000 web sites belonging to scientists, researchers, and members of the academic community. In OSTI terminology, harvesting is synonymous with conducting a directed crawl of web sites.

Before we look at the technology behind the harvesting, let's consider the question of why the content is harvested at all. Why not search the contributors' web sites in real-time in the same way that other collections are searched in real-time via federated search? There are several reasons for harvesting the content. First, a large number of e-prints are not found in databases. They are predominantly stored as document files in web server directories. Accessing files stored this way is the job of a web crawler, not that of a federated search engine. This is the case because, a crawler, once it locates the index page for a set of e-prints, easily harvests all e-prints referenced in that index page. The second reason...

Related Topics: doe, E-Print Network (EPN), federated search, osti


Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 1

by Sol Lederman 26 Feb, 2008 in Technology

The E-print Network is one of OSTI's most popular and powerful research offerings yet few of its users know about the advanced technology that drives it and makes it simple to use. Professional researchers in basic and applied science are able to access over 5 million e-prints gathered from nearly 28,000 world-wide databases and web-sites. Numerous OSTI innovations ensure that the E-print Network's documents are of extremely high quality, are highly relevant to researchers, and are easy and quick to find. This is the first in a series of articles about the technology behind this very important component of the Science Accelerator. This article serves as an overview; subsequent articles will provide more technical information.

The E-print Network is a federated search application. It federates (aggregates) search results from over 50 content databases in a number of scientific disciplines from a single user query. The E-print Network, however, uses federated search in an innovative way; One of the databases it searches is a special collection formed by harvesting over 1.3 million E-prints from nearly 28,000 hand-picked web-sites. A custom-designed crawler is responsible for performing the harvesting and custom software is used to build an index of the 1.3 million E-prints so that they can be searched quickly together with the non-harvested databases. Most E-print Network users are unaware that the application is, in fact, a blend of federated search and Google-like crawling technologies. This marriage of the two technologies reflects OSTI's insight in realizing that e-prints not only reside in certain well...

Related Topics: doe, E-Print Network (EPN), federated search, osti, Science Accelerator