Frequently Asked Questions
The DOE Data Explorer (DDE) launched in 2008 as a way to guide users to collections of publicly available, DOE-sponsored data and other non-text information. The collections may reside at data centers, user facilities, on pages maintained by groups within laboratories, or on university websites. They contain many forms and formats and reach across all of DOE's science disciplines. DDE was developed by the U.S. Department of Energy's Office of Scientific and Technical Information (OSTI). Hundreds of web pages were researched in order to identify the data and other non-text information referenced in the collection citations. Descriptions were created at OSTI, using text from the websites where the collections are posted for access.
In 2011, OSTI began to announce individual datasets and register them for Digital Object Identifiers (DOIs) through its partnership with DataCite. These individual dataset citations now coming to OSTI and flowing into both SciTech Connect and DOE Data Explorer are submitted by the creators and/or holders of the data. The metadata in these citations comes directly from the input source. While collection citations appear only in DDE, the citations for individual datasets are searchable in both DDE and SciTech Connect.
The original content of the DDE database included more citations for multimedia collections than it currently does. These citations led users to video or image collections that were more general in nature and met the criteria of non-text information, but could not actually be considered "data." In 2014, the decision was made to remove this particular slice of content and allow DDE to be more focused on "data," as opposed to non-text in general. For access to DOE videos, see ScienceCinema.
"Explore By" allows you to browse the DDE database and discover its content. Each "Explore By" option provides a different approach for your browsing. You may want to see all the titles in the database or discover how the various citations have been grouped into subject categories. The "Sponsoring Organizations" option allows you to quickly learn which datasets or collections are attributed to which DOE Program Offices and other funding organizations. "Other Organizations" combines names of the originating research organization (for datasets) and names of the host websites (for collections) into one list for browsing.
Note that, actually, two options are available for browsing by title. Dividing the titles into a list of those for individually submitted datasets/datastreams and those for data collections helps to keep each list to a more manageable size.
Each interactive list, when opened, shows each unique value only once. Select that value (a title, a subject category, the name of an organization) and another list opens. The second list displays the title of each citation in the database that has been indexed with the value you selected.
Each DDE citation is identified as one of two resource types: datasets or data collections. Can I limit a search to just one of those groups? How do the eight "data types" different fit into this picture and what do they include?
Yes, at the bottom of the Advanced Search screen is a "Limit to" picklist that lets you choose whether your search terms will run against datasets or data collections. If you do not use the "Limit to" option, your search will be conducted against the entire database.
Both resource types are further categorized into eight data types. These data types describe the main content of the dataset or data collection. There may be some overlap, ambiguity, or combinations, but the primary type of content can generally be described as one of the eight below. The TYPE field on the Advanced Search page opens a picklist with all eight data types. Choose one to limit your search to that particular type only.
Animations/Simulations: Animations are often very short, silent, generated by data points. Simulations normally output from computer model. Software model itself may be part of collection or stand as its own product. Input data files may or may not be included.
Figures/Plots: Links to published papers in which some of this material appeared may be present, but the figures and plots have been listed separately in recognition of their own importance. May also be a "data plotting" tool. Data points can be entered or queried; specialized interface provides the data plot.
Genome/Genetics Data: Gene Sequences, taxonomies, images or figures, software.
Interactive Data Maps: May be GIS data, a GIS database, or the interactive interface based on and changing with the underlying data, or combination of all. Interactive resources that behave in similar way but are based on non-geographic data also fall in this type, such as Chart of Nuclides from NNDC.
Multimedia: Videos documenting (showing) experiments or results.
Numeric Data: Primary content expressed in numbers; all other content is secondary and supporting. May be in tables, spreadsheets, mathematical equations. Often is binary monitoring data pulled from sensors.
Specialized Mix: Collection designed to be a specialized mixture of data and information types. It has structure, organization, and the way the information is put together is what gives it meaning. The information often does not exist elsewhere except in pieces. May work when data fits none of these categories comfortably.
Still Images or Photos: Images/photos of cells, molecules, structures of nanomaterials, etc., often taken with electron microscopes. Images/photos from particle collisions, astronomy, observation flights, etc.
Both active and archived data are included. Some individual datasets may be temporarily restricted, but to receive a DOI, they must be intended for eventual public availability. Also, if a dataset must be taken down for a legitimate reason, the DOI will retrieve a "tombstone page" that ensures you can know at least know what happened.
In the case of individual datasets, every dataset submitted by a DOE organization or a DOE grantee will be reflected in the DDE database and will be registered with a Digital Object Identifier (DOI).
These general criteria are followed for the data collections identified in DDE:
- The collection must consist primarily of non-text information, such as numeric files, figures or data plots, images, etc. Most collections include text, of course, but not as the main, most important content.
- Data should be the result of research and be maintained for reference purposes, analysis and reuse, or in support of specific projects. Calibration data, operating statistics, and normal log data for DOE's many research instruments are excluded. Specialized tools and codes may be part of the data collection, but collections that are only toolkits and software are normally excluded. The exception is computer models and animations/simulations. The line between the tool (model) and the tool's results (simulation) can be blurry, so there will be some collections tagged as animations/simulations that consist of primarily computer models.
- Collections may be small but should consist of more than just two or three items. Multiple items must logically fall under the collection's "title."
Individual dataset citations are directly submitted by the organization hosting the data or the creator/author/PI. Bibliographic information about datasets can be submitted to OSTI through the Announcement Notice 241.6 on the E-Link website or through a web service using an Application Programming Interface (API). See OSTI's Data ID Service for full details. Note that OSTI does not host the data; it is posted on the submitter's website.
Both datasets and collections may, indeed, contain multiple items, but only the "set" of information referred to in these FAQs as an dataset or datastream will have a Digital Object Identifier (DOI). The boundaries of a dataset/datastream have been defined by its data submitter to ensure that the DOI assigned to it reflects an appropriate level of granularity. Boundaries are not as well defined for collections that are discovered and identified by OSTI staff.
As for the difference between a dataset and a datastream, normally the dataset represents one experiment or one instance of something. But, in some projects, continuously-running instruments or sensors are monitoring operations or collecting readings (on an automated basis) each day or several times a day and over a period of months or a year or multiple years. In those cases, the files from a defined monitoring period may be identified "together" as a datastream rather than hundreds of different datasets. The Atmospheric Radiation Monitoring (ARM) Data Archive is an example of data center that does this. One year of readings from one specific instrument at one specific location may be identified as a datastream and may be identified by one DOI.
OSTI is a member of and registering agency for DataCite and has the authority to assign Digital Object Identifiers to datasets that are submitted by DOE and its contractors or grantees. The assigning and registration of a DOI for every dataset submitted is a free service provided by OSTI to enhance DOE's management of this important resource. See the information about OSTI's Data ID Service for full details.
For additional assistance, Contact Us.