Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 3
This is the third, and final, article in a series. The first article provided an overview of the E-print Network. The second article discussed the special harvested component of the E-print Network in depth. This article provides a tour of the E-print collections which are federated. Hopefully, once you finish reading this article and this series, you will appreciate the innovation and hard work that has gone into producing the premier federated search application for searching E-prints.
The E-print Network can simultaneously search 52 databases plus the special harvest collection, discussed in Part 2, from a single query. That single search has the effect of searching approximately 4 million documents from the federated sources plus another 1.3 million documents from the harvested collection for a total of roughly 5.3 million documents. This search executes in real time. A user can select all databases to search, individual databases, categories of databases, or combinations of individual databases and categories. The databases are divided into eight categories:
- Computer Technologies & Information Sciences
- Environmental Sciences and Ecology
- Institutional Repositories and Multidisciplinary Collections
- Nonlinear Sciences
- Renewable Energy
The relationship between categories and databases can be seen on the E-print Network search page. The collections description page provides detailed information about databases included in the E-print Network.
While most categories are self-explanatory, the "Institutional Repositories and Multidisciplinary Collections" category warrants an explanation. Of the 52 databases, 40 belong to seven categories that pertain to specific scientific disciplines. The remaining twelve databases are owned by universities or other institutions. These databases mostly span multiple disciplines. A number of the databases search the scholarly works of the university's faculty and staff.
Sixteen of the 52 databases on the E-print Network search page are identified as "arXiv" sources. These sources consist of nearly 470,000 documents that are part of an e-print service in the fields of physics, mathematics, non-linear science, computer science, quantitative biology, and statistics. The arXiv e-print service can be searched from the arXiv web-site. arXiv is the brainchild of Paul Ginsparg, professor of Physics and Computing & Information Science at Cornell University. The project was started in 1991 when Ginsparg was a staff member at DOE's Los Alamos National Laboratory. Cornell University operates and partially funds arXiv. The National Science Foundation provides additional funding. Wikipedia has two articles about Ginsparg and his work: Paul Ginsparg, and arXiv e-print archive.
The E-print Network, in similar fashion to a number of other OSTI federated search applications, provides an alert capability. Users can create and save queries and lists of sources to search. The E-print Network will run the query on the user's behalf every week and email the user a list of any previously unseen documents. This alert service allows users to be apprised of new documents from Ginsparg's arXiv collections, the other E-print Network federated collections, and the special harvest collection.
The E-print Network is a unique offering to the American people from OSTI on behalf of DOE. It provides exceptionally high quality e-print documents to the serious researcher and to the public. It is just one way that OSTI has innovated to spread scientific information and to advance science.
Consultant to OSTI