NOW Available - Physical Sciences Workshop Report 

A Future Information Infrastructure for the Physical Sciences

Walter L. Warnick, Ph.D., Director
Office of Scientific and Technical Information
U.S. Department of Energy

GovTech 2000
June 21, 2000
Washington, DC

 

The Scientific and Technical Information Program

Thank you for allowing me this opportunity to share the DOE vision for the digital revolution and what it means for information access and delivery. Now is the most exciting time in history to be part of the information enterprise. We have opportunities never before enjoyed. We are challenged with new capabilities almost daily.

However, before I get into the heart of my presentation, let me start with some background on the organization I represent.

For 53 years, the Office of Scientific and Technical Information (OSTI) within the Department of Energy has been managing DOE's technical information program. The origin of this program was the Manhattan Project during World War II. From the beginning the fundamental purpose of this program was to ensure that research results were reported and made available to the agency and the broader scientific community.

The mission of OSTI today is still the same: to collect, preserve, disseminate, and leverage the scientific and technical information (STI) resources of the Department of Energy. OSTI provides comprehensive, quick, and precise access to national and global STI for use by DOE, the scientific research community, academia, US industry, and the public.

Though the mission remains the same, the manner in which it is met is very different.

Our deployment of Information Age technologies has radically changed OSTI's service to our customers. OSTI has no choice; we must remain modern.

For science to rapidly advance at the frontiers, it must be open. And shared knowledge is the enabler of scientific progress. Scientific research and the knowledge and technologies that follow have been credited with about half of the productivity growth of the United States' economy in the past fifty years. The Department of Energy has been a proud sponsor of science-driven growth through the combined efforts of the National Laboratories, 70 Nobel Laureates and thousands of other outstanding university and industry-based researchers nationwide. Well over 100,000 people are directly involved.

Information Fuels the Science Mission

Today, almost all basic research in the U.S. is funded by the Federal government. The Department of Energy invests $7 billion annually in R&D.

The principal deliverable from R&D is scientific and technical information. STI serves the science mission of the Department as well as serving researcher needs. Scientists have long recognized the need for comprehensive collections of scientific and technical information (STI) in the physical sciences. Numerous studies have documented that access to information fuels scientific advances. It is in the vital interest of all research agencies that STI be disseminated as broadly and as quickly as possible.

This is the driving factor in our push to make STI more accessible. A vision has emerged of the great potential that advanced digital technologies offer. By tapping into the Information Age, we can place STI right at the desktop, ready for use by DOE scientists and program managers to fuel the Department's science mission. I'd like to share with you today our recent achievements.

OSTI-Developed Information Age Products

Until the mid-90s, the method of disseminating DOE research results was largely through bibliographic databases. First was the Nuclear Science Abstracts (NSA). NSA is an historic record of nuclear research from the early 1940s through June 1976. The scope was then broadened by the Department of Energy, and the Energy Science and Technology Database (EDB) covers from 1974 to the present. It is a comprehensive source of worldwide energy-related bibliographic information, both nuclear and non-nuclear. Both databases contain "information about information," which we now call "metadata." Together these databases offer more than 5 million records in energy science and technology.

Then along came the Information Age. Several vast virtual collections have been compiled to meet the needs of DOE's research and development (R&D) community. Today I will highlight a few of these.

Researchers communicate their findings in three main ways. Three virtual collections each focus on one of those.

DOE Information Bridge

Technical reports and other forms of gray literature are one means that scientists report their findings. The first significant advance for electronic dissemination at DOE actually occurred in late 1997 with gray literature. That was when we expanded access from bibliographic data to full text for gray literature and then made available it on the Internet free-of-charge. I am speaking of the DOE Information Bridge which was introduced to the public in April 1998 in partnership with GPO. By scanning full text of DOE-sponsored gray literature, including technical reports and conference papers, each word of each report is searchable. There are now over 55,000 digital items and over 4.3 million searchable pages. Ease of access to full-text reports has enabled more use of the information, with 14,000 full-text downloads per month. In 1999, this free online access was estimated to allow users to avoid costs of over $3 million.

In 1998 DOE was among the first to undertake such a significant change in direction. Other agencies are now doing this, too. More on that a bit later.

PubSCIENCE

"PubSCIENCE" was developed to facilitate searching and accessing peer-reviewed journal literature in the physical sciences and other energy-related disciplines.

Following the path forged by the National Library of Medicine with its life sciences product, PubMed, OSTI determined that the new Web technology could be used to integrate citations and abstracts into a searchable database, utilizing hyperlinks to take patrons to the publishers' doorstep where full-text information could be obtained. In assessing the need for such a collection in the physical sciences, OSTI worked closely with the American Physical Society. No comparable commercial product was available. PubSCIENCE filled a void.

An exciting feature of PubSCIENCE is that its citations are compiled in a new way! Collaborating publishers contribute their citations based on agreements negotiated with my Office.

PubSCIENCE allows the patron to search across abstracts and citations of multiple publishers at no cost to the patron. The patron need not know ahead of time which journal has the information she seeks. Once the patron has found an interesting abstract, a hyperlink provides access to the publisher's server to obtain the full text article. The article will come up immediately if the patron or his/her organization has a subscription to the journal. If the patron lacks such a subscription, access to the full text can be obtained by pay per view, by special arrangement with the publisher, library access or through commercial providers.

OSTI's primary patrons are scientists at the DOE system of National Laboratories. PubSCIENCE is particularly attractive to such large institutions, as they are increasingly using site licenses to bring full-text journals to their scientific staffs. For example, Los Alamos National Laboratory has site licenses to well over 2,000 journals. At any institution that has a site license hosted at a publisher's server, the hyperlinks to full-text in PubSCIENCE are automatically live.

Right now, PubSCIENCE has 1,048 different journals with bibliographic records for 1.8 million articles.

It is available to the public by a collaboration with the Government Printing Office, and we are indebted to the U.S. Superintendent of Documents Fran Buckley for this collaboration.

For the future, OSTI plans to continue to partner with journal publishers to add more titles to PubSCIENCE, consistent with the scope of the DOE R&D program.

Last year at this time, the President's Information Technology Advisory Committee (PITAC) presented a report envisioning the ways in which information technology could transform how we conduct research. The committee foresaw a time when all scientific and technical journals will be available online and completely searchable. PubSCIENCE is a step toward realizing that committee's vision.

The PrePRINT Network

Another Web-based product is the PrePRINT Network, launched on January 31 of this year. Increasingly scientists are posting preprints on the web, seeking on-demand access of the latest discoveries. The PrePRINT Network is a seamless gateway to preprint servers dealing with scientific and technical disciplines such as physics, materials, chemistry and other disciplines of concern to DOE. Much depends on the field of science. My office does not operate any of the preprint servers. Rather the PrePRINT Network is a gateway to 1,000 preprint servers run by other folks. These servers host over 330,000 preprints.

Patrons have several search options, including the ability to query multiple preprint servers and to browse by subject. One way is when the patron places a query, then PPN accesses several selected databases, causes searches to be done by their search engines, and then compiles the results for the patron. Essentially, the network is acting as a PARALLEL PROCESSOR, uniquely created for searching across multiple sources that do not have standardized data formats and are geographically dispersed. The user no longer has to know ahead of time which preprint server holds the information he seeks.

Another way is searching the indexed web pages. We work with the owners of these servers to continually add sources. If you know about servers that we have over overlooked, please let me know.

A new "Alert" service will be operational by August. The Alert will notify users via e-mail when new information is added that matches their stated areas of interest.

Science Communication Trilogy

With the addition of preprints to OSTI's suite of Web products and services, the trilogy of ways by which researchers make their results known are now accessible on the Web:

Each of these is a vast virtual collection. Whereas a few years ago, scientists communicated their findings primarily by two methods: gray literature and journal literature, they now have preprints as an increasingly popular third way to communicate.

My personal view is that this mix of three ways by which scientists communicate their findings will persist far into the future.

Each way has its own set of strengths and weaknesses. That is why we at DOE have determined not to mix products. Journals are kept separate from gray literature, and both are kept separate from preprints. Users have a distributed search system to pulse all systems with one query, if they choose, but we do not want users to lose sight of the type of literature they are viewing.

I like to tell folks that our aims are simple. We aim to be FIRST in gray literature, FIRST in journal literature, FIRST in preprints, and FIRST in the hearts of our researchers.

If users desire, we do offer a distributed search. The parallel processor searching capability I described for the PrePrint Network is the same information tool used in the EnergyPortal search. EnergyPortal is a special feature of the EnergyFiles Virtual Library Collections of Energy Science and Technology. The implications for building inexpensive distributed digital libraries are truly profound.

OSTI recognized early on that distributed searching was the "Holy Grail" (a quote taken from a Federal Computer News article on EnergyPortal) of the Information Age. The 25 most popular of EnergyFiles' 500-plus databases, including DOE databases, are integrated into the EnergyPortal search, thus providing a real one-stop shopping interface to diverse collections residing around the world. Try it sometime ... You'll be amazed at the quick response time and functionality.

Another area I'd like to highlight today is improving access to information across the Federal Government. We have several partnerships and collaborations that enhance access for the scientific community.

Earlier I talked about the gray literature in the DOE Information Bridge. Other agencies are also making their report literature available on the web. In particular, NASA, EPA, and DTIC have technical reports accessible on web sites.

We recently worked collaboratively and have just announced a new tool called GrayLit Network.

The GrayLit Network is a portal for technical report information located in databases residing at different U.S. federal agencies. These are searched in parallel through a single query. The GrayLit Network search, spawned by the user's query, relies on the search capability offered by each particular site. Therefore, in the case of the DOE Information Bridge, the full text of each technical report is searched and the results are returned accordingly. However, in some of the other full-text databases, only the bibliographic information is searched at the site. In these cases, the full text may still be downloaded by the user if desired. By offering a mode of communication for this hard-to-find class of literature, the GrayLit Network enables convenient access by the American public to government information without requiring that the public first figure out which Agency owns the information.

Another interagency information tool prototype that is now being introduced is the Federal R&D Project Summaries. It provides one-stop access to DOE's R&D Project Summaries Database, NIH's CRISP Database, and NSF's Awards Database. Using this information tool, the public can be more knowledgeable - and have better understanding - of the ongoing research efforts being funded by Federal government's multibillion dollar R&D investment.

The Federal R&D Project Summaries search capability is another exciting and useful technological example of what can result from a collaboration between our agencies.

Where Is This All Leading?

The progress we have achieved so far has made us think about institutionalizing all this. Are we doing all we can to serve the scientific community? To ensure that the ever-growing stores of scientific knowledge are available at the desktop? To enable scientists to retrieve and use information that is authoritative, meaningful, and valuable? To build information tools conducive to contributing to scientific exploration? These are but a few of the challenges and opportunities we face daily. To respond to these challenges, we are exploring a concept for a Future Information Infrastructure for the Physical Sciences.

The topic is being discussed by our various stakeholders and partners. Many attributes of the currently established National libraries of other agencies are already in place at DOE in regard to the collection and management of scientific and technical information. We would like to establish the importance of this information to the nation and guarantee information preservation for future generations.

A key concern of the information community is permanent public access to information.

Instituting an infrastructure is the best way to promote permanent public access to government information, a place where researchers, educators, students and citizens can come for answers. Our vision is to have DOE National Laboratories, Program Offices, GPO, other federal agencies, universities, publishers, and libraries working together to accomplish mutual goals in the advancement of science. The digital frontier offers more opportunities than any single organization can pursue. By working together, we can take advantage of these opportunities and meet the challenges successfully.

Meeting the Need - Realizing the Vision

The result will meet the research community's needs and realize a long-held vision: to effectively and efficiently provide access to comprehensive information. And, thus, to share knowledge that enables scientific progress. Given our recent history and our dedication and drive, I believe this can be done.