PubSCIENCE: A Cutting-Edge Component for a
National Digital Library

Walter L. Warnick, Ph.D., Director
Office of Scientific and Technical Information
U.S. Department of Energy
 

Presented at NFAIS Annual Conference
February 21, 2000
Philadelphia, PA

Abstract:

Information Age technologies have radically changed the Department of Energy’s (DOE) Office of Scientific and Technical Information (OSTI) information services.  For over 50 years OSTI has been collecting, preserving, and disseminating scientific and technical information for DOE. By tapping into the Information Age, new products are being developed to place STI right at the desktop, ready for use by DOE scientists and others.  The progression to the electronic dissemination of this information has been under way for over 6 years, highlighted by the DOE Information Bridge with over 62,000 digital items and over 4 million searchable pages of full-text grey literature, and the recent unveiling of PubSCIENCE, which focuses on journal literature. Information Age technology allows agencies to envision a searchable and comprehensive collection of information  -- i.e., “National Digital Libraries” -- on topics of concern to each Executive Branch Agency and to make it available to the public.


Thank you for the opportunity to speak to NFAIS.  DOE's membership in NFAIS is valuable.

Background 

For 53 years, the Office of Scientific and Technical Information (OSTI) within the Department of Energy (and its predecessor agencies) has been managing DOE’s technical information program.  The origin of this information program was the Manhattan Project during World War II.   From the beginning the fundamental purpose of this program was to ensure that research results were reported and made available to the agency and the broader scientific community.

The mission of OSTI today is still the same:  to collect, preserve, disseminate, and leverage the scientific and technical information (STI) resources of the Department of Energy. OSTI provides access to national and global STI for use by DOE, the scientific research community, academia, US industry, and the public.

Though the mission remains the same, the manner in which it is met is very different.

Our deployment of Information Age technologies has radically changed OSTI's service to our customers. OSTI has no choice; we must remain modern.

 

Information: Fueling the Science Mission 

Today, almost all basic research is funded by the Federal government.  But what good is basic research unless the resulting information is accessible and used?  This is the driving factor in our push to make STI more accessible. A vision has emerged of the great potential that advanced digital technologies offer.  By tapping into the Information Age, we can place STI right at the desktop, ready for use by DOE scientists and program managers to fuel the Department’s science mission. 

Secretary Richardson stated, "For science to rapidly advance at the frontiers, it must be open. And shared knowledge is the enabler of scientific progress."  Scientific research and the knowledge and technologies that follow have been credited with about half of the productivity growth of the United States’ economy in the past fifty years. What growth it has been! Millions of high-skill, high-wage jobs; the longest life expectancy in human history; agricultural output to confound Malthus; new means of working and communicating on a global basis; and exciting new frontiers to explore. The Department of Energy (and its predecessor agencies) has been a proud sponsor of science-driven growth through the combined efforts of the National Laboratories, 68 Nobel Laureates and thousands of other outstanding university and industry-based researchers nationwide.

The 20th century was the Century of Physics producing nuclear power, space travel, computers, and countless other advances.  That century has ended, and life sciences now are offering immense opportunities. Too little appreciated is the fact that much of the progress in the life sciences is dependent upon prior advances in the physical sciences.

The Department of Energy invests $7 billion annually in R&D.  Building on its strength in the physical sciences, the Department of Energy will invest a portion of its budget on medical research in FY 2000, to include:

$       $50 million in topics relating to pharmaceuticals and isotopes;
$       $90 million relating to genome research;
$      
$30 million relating to structural biology.

The principal output from R&D is scientific and technical information.  STI serves the science mission of the Department as well as serving researcher needs. It is in the vital interest of all research agencies that STI be disseminated as broadly and as quickly as possible.  So, let me describe what we are doing to make it accessible.

OSTI-Developed Information Age Products 

“PubSCIENCE” - as most of you are aware - made news when it was unveiled October 1, 1999.  Before I provide a status report, first I’d like to share a bit about the history that led up to this development.

PubSCIENCE is the culmination of an agency’s lifetime tradition of scientific and technical information dissemination that now is bringing information to the desktop. It was developed to facilitate searching and accessing peer-reviewed journal literature in the physical sciences and other energy-related disciplines to meet the researcher’s growing need for scientific information at the desktop.  In collaboration with the Government Printing Office, PubSCIENCE is also available for public use through “GPO ACCESS.”  It can be accessed at http://www.osti.gov/pubscience.

Until recently, the method of disseminating DOE research results was through bibliographic databases.  First was the Nuclear Science Abstracts (NSA); NSA is an historic record of nuclear research beginning with the Manhattan Project in the early 1940s and following throughout the life of the Atomic Energy Commission through June 1976. When the energy mission was broadened in the mid-1970s to include non-nuclear energy, such as fossil and solar energy, NSA was supplanted by the Energy Science and Technology Database (EDB).  It is a comprehensive source of worldwide energy-related bibliographic information, both nuclear and non-nuclear, from 1974 to the present.  It represents the broadened scope of energy research from the creation of the Energy Research and Development Administration and then the Department of Energy (DOE) in 1977. Together these databases offer more than 5 million records in energy science and technology.

Both databases contain “information about information,” referring the patron to the paper or microfiche sources of the full text documents described. The full text might be obtained from OSTI if you were within DOE; or from NTIS, a GPO Depository Library, or the journal or book publisher.

Then along came the Information Age, with technologies that have radically changed OSTI’s information services.  Several vast electronic collections have been compiled to meet the needs of DOE’s research and development (R&D) community.   In efforts to better serve the patron, who increasingly wants information right at the desktop, OSTI has transitioned its operations from a paper-based environment to an electronic environment.  We are working hard to share information faster, more completely, more conveniently, and at lower cost.

DOE Information Bridge 

The progression to electronic dissemination of this information has been under way at OSTI for some time.  The most significant advance occurred when access was expanded from bibliographic data to full text grey literature.  OSTI’s introduction of the DOE Information Bridge provided access to the full text of DOE-sponsored grey literature (technical reports, conference papers, etc.).  Each word of each report is searchable in this collection. As of February 2000, the Information Bridge had grown to over 62,000 items and to over 4 million searchable pages.

With the use of this Information Age collection well established for grey literature, DOE turned its sights toward the other ways by which scientists disseminate their findings. The most prevalent form is journal literature.

PubSCIENCE  

 Following the path forged by the National Library of Medicine with its life sciences product, PubMed, OSTI determined that the new Web technology could be used to integrate publisher submitted citations and abstracts into a searchable database, utilizing hyperlinks to take patrons to the publishers’ doorstep where full-text information could be obtained.  In assessing the need for such a collection in the physical sciences, OSTI worked closely with the American Physical Society.  PubSCIENCE filled a void.

Working in partnership with GPO and 21 publishers, PubSCIENCE was unveiled at a ribbon-cutting event with the support of Energy Secretary Bill Richardson and the Superintendent of Documents Fran Buckley, who will be speaking here a bit later. 

(Viewgraph 7) An exciting feature of PubSCIENCE is that PubSCIENCE citations come in a new way!  Collaborating publishers contribute their citations based on agreements negotiated with OSTI.

PubSCIENCE allows the patron to search across abstracts and citations of multiple publishers at no cost to the patron.  The patron need not know ahead of time which journal has the information she seeks.  Once the patron has found an interesting abstract, a hyperlink provides access to the publisher's server to obtain the full text article.  The article will come up immediately if the patron or his/her organization has a subscription to the journal.  If the patron lacks such a subscription, access to the full text can be obtained by pay per view, by special arrangement with the publisher, library access or through commercial providers.

OSTI’s primary patrons are scientists at the DOE system of National Laboratories.  PubSCIENCE is particularly attractive to such large institutions, as they are increasingly using site licenses to bring full-text journals to their scientific staffs.  For example, Los Alamos National Laboratory has site licenses to well over 2,000 journals. At any institution that has a site license hosted at a publisher’s server, the hyperlinks to full-text in PubSCIENCE are automatically live.

Currently PubSCIENCE covers 1032 journals of 26 participating publishers, as well as 1.7 million journal citations.  In the future of PubSCIENCE, we plan to continue expanding the number of journal titles in areas of interest to DOE researchers and to expand coverage from publishers where possible.  We are working with the current publishers to obtain feedback towards improving the web site.

Global information sharing has become a reality via the Web, making public availability actually easier and cheaper to implement than restricting access. The response from patrons has been quite favorable. For the future, OSTI plans to continue to partner with journal publishers to add more titles to PubSCIENCE, consistent with the scope of the DOE R&D program.  

Some futurists have caused a stir by predicting the demise of traditional scholarly publishing.  Lamenting the costs incurred by libraries to purchase subscriptions and delays in publishing, they cite preprint or e-print servers as the wave of the future which will replace traditional publishing.

Indeed, the media of traditional publishing is changing.  Most of the traditional journals are now available electronically.  But, we should not equate an evolution of media, from paper to electronic, with a fundamental shift away from journals.  I see no real pressure coming from scientists to forsake traditional publishers.  Indeed, a recent survey of scientists showed that, when selecting the journal to submit their papers, the most important factor in their decision, by far, is prestige of the journal.  Not cost of the journal, not speed of publication, but prestige.

I see no real pressure being applied to researchers to change this perspective.  In particular, it is not clear how a preprint server can attain the kind of prestige that is associated with a premier journal, unless there is a gate keeper to keep out less then stellar preprints.  Peer review is the gatekeeper function performed by traditional publishers.  It is their real added value.  While preprint servers solicit and receive peer comments, peer comment is not at all the same as peer review.

Last year at this time, the President’s Information Technology Advisory Committee (PITAC) presented a report envisioning the ways in which information technology could transform how we conduct research.  The committee foresaw a time when all scientific and technical journals will be available online and completely searchable. PubSCIENCE is a step toward realizing that committee’s vision.

PrePRINT Network 

DOE’s most recent Web-based product, however, is not PubSCIENCE, but rather the PrePRINT Network, launched on January 31 of this year.  The PrePRINT Network is a seamless gateway to preprint servers dealing with scientific and technical disciplines such as physics, materials, chemistry and other disciplines of concern to DOE.  My office does not operate any of the preprint servers.  Rather the PrePRINT Network is a gateway to the universe of preprint servers, which number over 400 with 235,000 preprints.  Patron's have several options including the ability to query multiple preprint servers and to browse by subject.  When the patron places a query, then PPN accesses several selected databases, causes searches to be done by their search engines, and then compiles the results for the patron.  Essentially, the network is acting as a PARALLEL PROCESSOR, uniquely created for searching across multiple sources that do not have standardized data formats and are geographically dispersed.   The user no longer has to know ahead of time which preprint server holds the information he seeks.

The parallel processor searching capability is the same information technology used in another OSTI product, EnergyPortal search, a special search feature within our Virtual Library Collections of Energy Science and Technology The implications for building inexpensive distributed digital libraries are truly profound.

Science Communication Trilogy

With the addition of preprints to OSTI’s suite of Web products and services, the trilogy of ways by which researchers make their results known are now accessible on the Web:

Each of these is a vast virtual collection.  Whereas a few years ago, scientists communicated their findings primarily by two methods:  grey literature and journal literature, they now have preprints as an increasingly popular third way to communicate.  My personal view is that this mix of three ways by which scientists communicate their findings will persist far into the future.

Each way has its own set of strengths and weaknesses.  That is why we at DOE have determined not to mix products.  Journals are kept separate from grey literature, and both are kept separate from preprints.  Users have a distributed search system to pulse all systems with one query, if they choose, but we do not want users to lose sight of the type of literature they are viewing.

Interestingly, in 1991, a visionary recommendation was made in a report commissioned by the American Physical Society.  Called the “Loken Report,” it called for the development of a National Physics Database to integrate “all of the world’s scientific literature information in an electronic information system.” Given the developments I’ve just described, that vision no longer seems such a “stretch.”  Loken Report author Dr. Harry Thacker recently said, “I've often thought that the only thing we got wrong in the report was the time scale. We thought we were talking about 2020 and it turned out to be more like 2000.” 

Given the developments of several Federal agencies and the numerous national library initiatives, that vision will soon be reality.

National Libraries

Three Cabinet-level agencies have National Libraries.  They include the National Institutes of Health (National Library of Medicine, the Department of Education (National Library of Education), and the Department of Agriculture (National Agricultural Library).  Each of these is making great strides with digital collections. 

By any measure, the National Library of Medicine (NLM) is the leader.  They produced PubMed several years ago, and it has become the single most used collection of information in medicine.  Recently, the National Library of Medicine has launched PubMed Central, which – unlike PubMed – hosts the full text of journal articles on NLM servers. Liz Pope will tell us more about PubMed Central in a minute. DOE copied PubMed when it created PubSCIENCE, but DOE has no plans to emulate PubMed Central.

The National Agricultural Library has recently made its AGRICOLA database freely available on the Web.  It differs from PubSCIENCE in two ways: (1) the obvious difference in subject matter, and (2) AGRICOLA does not offer hyperlinks to full text.

The National Library of Education has recently made its ERIC database freely available on the Web. ERIC has features very similar to AGRICOLA. 

Two additional agencies, the Environmental Protection Agency (EPA National Library Network Program) and the Department of Transportation (DOT National Transportation Library), have Web sites offering access to extensive collections of online information.

Additionally, the National Science Foundation (NSF) has a program solicitation for the National Science, Mathematics, Engineering, and Technology Education Digital Library (NSDL). This would found a national digital library that will constitute an online network of learning resources. It has $13 million of new money, and a solicitation for grant proposals is on the street.  In a related effort, the Institute for Museum and Library Services has been charged by Executive Order to create a Digital Library of Education to provide digital resources for lifelong learning. 

The mere existence of a National Digital Library not only fosters the dissemination of information, but its preservation as well. It would be the surest way to promote permanent public access to government information. Additionally, the term National Digital Library announces to the world that the agency has information resources of which it is proud.

The nation needs a National Library focusing on energy, science, and technology -- a place where researchers, educators, students, and citizens can come for answers.

Designing the Future: Mutual Interests of Federal Science Agencies

There is great potential to use information technology to improve people’s lives. The Internet and other information and communications technologies are changing the way we work, learn, and communicate, as well as how we do business.  These technologies are shaping our economy and our society in the same way that the steam engine and electricity defined the Industrial Age.

The National Science Policy Report in 1998 stated:

The federal investment in science has yielded stunning payoffs. It has spawned not only new products, but also entire industries. To build upon the strength of the research enterprise we must make federal research funding stable and substantial, maintain diversity in the federal research portfolio, and promote creative, groundbreaking research.

In recent years, information technology has driven the U.S. economy and has been the major growth market.  Businesses are scrambling to use the Internet.

The U.S. Government is also striving to make greater use of the Internet.  In December 1999, the President issued a memorandum for the heads of Executive Departments and Agencies on the “Use of Information Technology to Improve Our Society.”  Expectations are that more information and services will be made available electronically. 

The Federal science agencies have mutual interests:

The information society is upon us.  We must not just react to it but we must urgently seize the opportunities that arise.