Frequently Asked Questions (FAQS)

Why was the DOE Data Explorer developed? How has its original "vision" changed over the years?

How was it developed and by whom?

What are the criteria that determine which data collections are referenced? What about individual datasets?

Where do the individual datasets come from? How do they get into DDE?

How can a user tell the difference between a collection and an individual dataset? Isn't it true that both may have multiple items in their content?

What's the difference between a dataset and a datastream?

Tell me about the Digital Object Identifiers (DOIs) on these individual datasets or datastreams.

How is DOE financial support determined?

Why do many of the collections not have a Creator/Author listed?

How many data collections are described in the database? Won't the individual datasets outnumber the collections?

Do the collections have to be active or can archival collections be included?

How often is the DDE database updated?


Why was the DOE Data Explorer developed? How has its original "vision" changed over the years?

The DOE Data Explorer (DDE) launched in 2008 as a way to guide users to collections of publicly available, DOE-sponsored data and other non-text information. The collections may reside at data centers, user facilities, on pages maintained by groups within laboratories, on university websites, and on commercial sites such as YouTube, Vimeo, and SciVee. They contain many forms and formats and reach across all of DOE's science disciplines. DOE Data Explorer was, and still is, intended to be particularly useful to students, the public, and to researchers who are new to a field or looking for data outside of their normal field of expertise.

In 2011, OSTI began to announce individual datasets and register them for Digital Object Identifiers (DOIs) through its partnership with DataCite. Soon DDE users were able to view both collection records and records identifying individual datasets with one search. The original database design, however, meant that the two types of results could not be merged and manipulated inside the DDE product. But in spring of 2013, DDE was completely redesigned. The website took on a new look, and the database was rewritten with different software, new functionalities, and a true merger of its two types of records - those guiding users to collections and those guiding users to individual datasets with persistent identifiers (DOIs). Both types of records now share the same search, display, and manipulation functionalities.

The changes between 2008 and 2013 have resulted in a fuller realization of DDE's original goal: to help users explore the Department's vast universe of data, multimedia, and other non-text STI.

How was DDE developed and by whom?

The database was developed by the U.S. Department of Energy Office of Scientific and Technical Information (OSTI), within the Office of Science. Hundreds of web pages were researched in order to identify the data, multimedia, and other non-text information referenced in DDE's collection citations. Descriptions were created at OSTI, using text from the websites where the collections are posted for access.

The individual dataset records now coming to OSTI and flowing into both SciTech Connect and DOE Data Explorer are submitted by the creators and/or holders of the data. The metadata in these records comes directly from the input source.

What are the criteria that determine which data collections are referenced? What about individual datasets?

These general guidelines and criteria are followed for selecting collections in scope:

  • The collection must consist primarily of non-text information, such as numeric files, figures or data plots, images, multimedia, etc. Most collections include text, of course, but not as the main, most important content. The data/non-text types are defined in Help.
  • Data should be the result of research and be maintained for reference purposes, analysis and reuse, or in support of specific projects. Calibration data, operating statistics, and normal log data for DOE's many research instruments are excluded. Specialized tools and codes may be part of the data collection, but collections that are only toolkits and software are normally excluded. The exception is computer models and animations/simulations. The line between the tool (model) and the tool's results (simulation) can be blurry, so there will be some collections tagged as animations/simulations that consist of primarily computer models.
  • Multimedia collections may be a mixture of research-focused information and information related to DOE as an organization.
  • Collections may be small but should consist of more than just two or three items. Multiple items must logically fall under the collection's "title."

In the case of individual datasets, every dataset submitted by a DOE organization or a DOE grantee will be reflected in the DDE database and will be registered with a Digital Object Identifier (DOI).

Where do the individual datasets come from? How do they get into DDE?

Unlike the collection records in DDE, which are prepared and entered into the DDE database by OSTI staff, individual datasets are directly submitted by the data host or the creator/author/PI. Datasets can be submitted to OSTI through the Announcement Notice 241.6 on the E-Link website or through a web service using an Application Programming Interface (API). See OSTI's Data ID Service for full details.

How can a user tell the difference between a collection and an individual dataset? Isn't it true that both may have multiple items in their content?

Yes, both may contain multiple items, but only the "set" of information referred to in this FAQ as an "individual dataset" will have a Digital Object Identifier (DOI). The boundaries of an "individual dataset" have been defined by its data submitter to ensure that the DOI assigned to it reflects an appropriate level of granularity. The submitter has also agreed, prior to submittal, to maintain the access to the dataset indefinitely. Boundaries are not as well defined for collections that are discovered and identified by OSTI staff.

What's the difference between a dataset and a datastream?

The creator/author/PI has to define what constitutes a dataset. A dataset may be one file or may contain many files, and the files may include information in various media and formats. Normally, however, the dataset represents one experiment or one instance of something.

In some projects, continuously-running instruments or sensors are monitoring operations or collecting readings (on an automated basis) each day or several times a day and over a period of months or a year or multiple years. In those cases, the files from a defined monitoring period may be identified "together" as a datastream rather than hundreds of different datasets. The Atmospheric Radiation Monitoring (ARM) Data Archive is an example of data center that does this. One year of readings from one specific instrument at one specific location may be identified as a datastream and may be identified by one Digital Object Identifier.

Tell me about the Digital Object Identifiers (DOIs) on these individual datasets or datastreams.

OSTI is a member of and registering agency for DataCite and has the authority to assign Digital Object Identifiers to datasets that are submitted by DOE and its contractors or grantees. The assigning and registration of a DOI for every dataset submitted is a free service provided by OSTI to enhance DOE's management of this important resource. See the information about OSTI's Data ID Service for full details.

How is DOE financial support determined?

For collections, information on the host website is used. Often project details clearly state the funding sponsors. If not, details such as research organizations and contract numbers surrounding the data are examined. For datasets, the submitter is required to include the sponsor/funding organization in the metadata.

Why do many of the collections not have a Creator/Author listed?

An individual's name is given as PI only when it is clear that he/she is responsible for the collection as a whole. Each dataset within the collection may have a different person credited as PI, but the DOE Data Explorer citations focus on the collection as a whole. Often, however, a scientific collaboration exists and has an "official" name. A collaboration is listed as a contributing entity when appropriate.

How many data collections are described in the database? Won't the individual datasets outnumber the collections?

The list is growing. Approximately 200 collections were initially cited identified; as of early 2013, there were more than 550. We have a standing invitation to all DOE organizations and to all customers: Please notify us if we've missed a collection you know about or if we have not correctly or adequately described a collection that you maintain. As for datasets outnumbering collections... yes, that's how it really should be. We hope that eventually every data collection will have each one of its datasets submitted individually.

Do the collections have to be active or can archival collections be included?

Both active and archival collections are included. Because DOE and its predecessor organizations have been generating data since the 1940's, some collections could possibly date from earlier days. However, the collection should have some of its information, at least, posted on the web for access. Other collections, of course, are the results of very new research. Where possible, the description given in the citation includes the date range of the collection's contents.

Note that individual datasets will eventually become older and may be considered archive material also. Their DOI, however, ensures that they will remain available or at least "trackable" through a "tombstone page."

How often is the DDE database updated?
New additions or any corrections to the content are made as soon as possible after being identified. The list of new items is updated each month on the What's New page. A different collection will also be highlighted in the Featured Data Collection area of the website every month. Suggestions are welcome.