Frequently Asked Questions
The DOE Data Explorer (DDE) launched in 2008 as a way to guide users to collections of publicly available, DOE-sponsored data and other non-text information. The collections may reside at data centers, user facilities, on pages maintained by groups within laboratories, on university websites, and on commercial sites such as YouTube, Vimeo, and SciVee. They contain many forms and formats and reach across all of DOE's science disciplines. DOE Data Explorer was, and still is, intended to be particularly useful to students, the public, and to researchers who are new to a field or looking for data outside of their normal field of expertise.
In 2011, OSTI began to announce individual datasets and register them for Digital Object Identifiers (DOIs) through its partnership with DataCite. Soon DDE users were able to view both collection citations and citations identifying individual datasets with one search. The original database design, however, meant that the two types of results could not be merged and manipulated inside the DDE product. But in spring of 2013, DDE was completely redesigned. The website took on a new look, and the database was rewritten with different software, new functionalities, and a true merger of its two types of records - those guiding users to collections and those guiding users to individual datasets with persistent identifiers (DOIs). In early 2014 the "look and feel" was modified again to more closely resemble the look and feel of other products in OSTI's "family."
The DOE Data Explorer (DDE) is an information tool to help you locate DOE's collections of data and non-text information and, at the same time, retrieve individual datasets within some of those collections. It includes collection citations prepared by the Office of Scientific and Technical Information, as well as citations for individual datasets submitted from DOE Data Centers and other organizations.
All of the collections and all of the individual datasets result from research and development funded in whole or in part by the Department of Energy. Many of the collections reflect combined funding - DOE's combined with that from other agencies or the private sector.
Collections of Data and Non-Text InformationDOE Data Explorer's collections are intended to be particularly useful to students, the public, and to researchers who are new to a field or looking for experimental or observational data outside their normal field of expertise. Each collection is funded either in whole or in part by the Department of Energy. The collections reside at national laboratories, data centers, user facilities, colleges and universities, or the websites of professional organizations, consortiums, corporate institutions, or international organizations. All of the collections are available for free access, although some require password registration.
A key component of each collection citation is a link that takes you directly to the data at the host website. This allows you to see and utilize the highly specialized interfaces that have been developed for many of these collections. The interfaces provide customized ways to search data, compare sets of data, visualize data, and package it for download and reuse.
New collections are added to the database as they are identified and if they meet the criteria for inclusion. If you manage, use, or know of a collection that you believe we should include, we encourage you to contact us.
Individual Datasets and Digital Object Identifiers (DOIs)In FY2011, metadata describing individual datasets began to flow into OSTI. OSTI is a member of and a registering agency for DataCite and has the authority to assign Digital Object Identifiers (DOIs) to datasets submitted by DOE and its contractors or grantees. The assigning and registration of a DOI for every dataset submitted is a free service provided by OSTI to enhance DOE's management of this important resource. See OSTI's Data ID Service on the DDE home page for more information about DOIs for data.
The database was developed by the U.S. Department of Energy Office of Scientific and Technical Information (OSTI), within the Office of Science. Hundreds of web pages were researched in order to identify the data, multimedia, and other non-text information referenced in DDE's collection citations. Descriptions were created at OSTI, using text from the websites where the collections are posted for access.
The individual dataset citations now coming to OSTI and flowing into both SciTech Connect and DOE Data Explorer are submitted by the creators and/or holders of the data. The metadata in these citations comes directly from the input source.
These general guidelines and criteria are followed for selecting collections in scope:
In the case of individual datasets, every dataset submitted by a DOE organization or a DOE grantee will be reflected in the DDE database and will be registered with a Digital Object Identifier (DOI).
- The collection must consist primarily of non-text information, such as numeric files, figures or data plots, images, multimedia, etc. Most collections include text, of course, but not as the main, most important content.
- Data should be the result of research and be maintained for reference purposes, analysis and reuse, or in support of specific projects. Calibration data, operating statistics, and normal log data for DOE's many research instruments are excluded. Specialized tools and codes may be part of the data collection, but collections that are only toolkits and software are normally excluded. The exception is computer models and animations/simulations. The line between the tool (model) and the tool's results (simulation) can be blurry, so there will be some collections tagged as animations/simulations that consist of primarily computer models.
- Multimedia collections may be a mixture of research-focused information and information related to DOE as an organization.
- Collections may be small but should consist of more than just two or three items. Multiple items must logically fall under the collection's "title."
Unlike the collection citations in DDE, which are prepared and entered into the DDE database by OSTI staff, individual datasets are directly submitted by the data host or the creator/author/PI. Datasets can be submitted to OSTI through the Announcement Notice 241.6 on the E-Link website or through a web service using an Application Programming Interface (API). See OSTI's Data ID Service for full details.
Yes, both may contain multiple items, but only the "set" of information referred to in this FAQ as an "individual dataset" will have a Digital Object Identifier (DOI). The boundaries of an "individual dataset" have been defined by its data submitter to ensure that the DOI assigned to it reflects an appropriate level of granularity. The submitter has also agreed, prior to submittal, to maintain the access to the dataset indefinitely. Boundaries are not as well defined for collections that are discovered and identified by OSTI staff.
The creator/author/PI has to define what constitutes a dataset. A dataset may be one file or may contain many files, and the files may include information in various media and formats. Normally, however, the dataset represents one experiment or one instance of something.
In some projects, continuously-running instruments or sensors are monitoring operations or collecting readings (on an automated basis) each day or several times a day and over a period of months or a year or multiple years. In those cases, the files from a defined monitoring period may be identified "together" as a datastream rather than hundreds of different datasets. The Atmospheric Radiation Monitoring (ARM) Data Archive is an example of data center that does this. One year of readings from one specific instrument at one specific location may be identified as a datastream and may be identified by one Digital Object Identifier.
OSTI is a member of and registering agency for DataCite and has the authority to assign Digital Object Identifiers to datasets that are submitted by DOE and its contractors or grantees. The assigning and registration of a DOI for every dataset submitted is a free service provided by OSTI to enhance DOE's management of this important resource. See the information about OSTI's Data ID Service for full details.
For collections, information on the host website is used. Often project details clearly state the funding sponsors. If not, details such as research organizations and contract numbers surrounding the data are examined. For datasets, the submitter is required to include the sponsor/funding organization in the metadata.
An individual's name is given as PI only when it is clear that he/she is responsible for the collection as a whole. Each dataset within the collection may have a different person credited as PI, but the DOE Data Explorer collection citations focus on the collection as a whole.
The list is growing. Approximately 200 collections were initially cited identified; as of early 2013, there were more than 550. We have a standing invitation to all DOE organizations and to all customers: Please notify us if we've missed a collection you know about or if we have not correctly or adequately described a collection that you maintain. As for datasets outnumbering collections... yes, that's how it really should be. We hope that eventually every data collection will have each one of its datasets submitted individually.
Both active and archival collections are included. Because DOE and its predecessor organizations have been generating data since the 1940's, some collections could possibly date from earlier days. However, the collection should have some of its information, at least, posted on the web for access. Other collections, of course, are the results of very new research. Where possible, the description given in the citation includes the date range of the collection's contents.
Note that individual datasets will eventually become older and may be considered archive material also. Their DOI, however, ensures that they will remain available or at least "trackable" through a "tombstone page."
New additions or any corrections to the content are made as soon as possible after being identified.
DDE groups its collections and individual datasets into eight data/non-text information types. It is normally obvious that the main content of the collection or the dataset is one of these “types.” There may be some overlap, ambiguity, or combinations, but the primary content type is categorized as one of the eight below. The TYPE field on the Advanced Search page opens a picklist with all eight data types. Choose one to limit your search to that particular type only.
Animations/Simulations: Animations are often very short, silent, generated by data points. Simulations normally output from computer model. Software model itself may be part of collection or stand as its own product. Input data files may or may not be included.
Figures/Plots: Links to published papers in which some of this material appeared may be present, but the figures and plots have been listed separately in recognition of their own importance. May also be a "data plotting" tool. Data points can be entered or queried; specialized interface provides the data plot.
Genome/Genetics Data: Gene Sequences, taxonomies, images or figures, software.
Interactive Data Maps: May be GIS data, a GIS database, or the interactive interface based on and changing with the underlying data, or combination of all. Interactive resources that behave in similar way but are based on non-geographic data also fall in this type, such as Chart of Nuclides from NNDC.
Multimedia: Videos documenting (showing) experiments or results.
Numeric Data: Primary content expressed in numbers; all other content is secondary and supporting. May be in tables, spreadsheets, mathematical equations. Often is binary monitoring data pulled from sensors.
Specialized Mix: Collection designed to be a specialized mixture of data and information types. It has structure, organization, and the way the information is put together is what gives it meaning. The information often does not exist elsewhere except in pieces. May work when data fits none of these categories comfortably.
Still Images or Photos: Images/photos of cells, molecules, structures of nanomaterials, etc., often taken with electron microscopes. Images/photos from particle collisions, astronomy, observation flights, etc.
For additional assistance, Contact Us.