OSTI's Data ID Service
Note 1: This section pertains to datasets, dataset records, and Digital Object Identifiers (DOIs). It is not related to the collection citations in DDE.
Note 2: See useful links at the end of this section.
About the Data ID Service
The Office of Scientific and Technical Information (OSTI) became a member of, and a registering agency for, DataCite in 2011 and now assigns permanent identifiers, known as Digital Object Identifiers (DOIs), to publicly available scientific research datasets. These datasets (datastreams, data files, etc.) support the technical reports and published literature resulting from DOE's research. They are also recognized as valuable information entities in their own right that, now and in the future, need to be available for citation, discovery, retrieval, and reuse. The assignment and registration of a DOI for every dataset submitted is a free service for DOE researchers that is provided by OSTI to enhance DOE's management of this important resource.
The Resulting Benefits
- Announcing and registering datasets with DOIs enables researchers, especially future researchers, to more easily discover the data, access it, and reuse it for verification of the original experiment or to produce new results with the latest methods.
- Because of the responsibilities a submitter must meet in order to have DOIs assigned for datasets, users seeing those DOIs know the information has a level of integrity and commitment backing it that becomes part of its provenance.
- DOIs facilitate accurate linkage between a document or published article and the specific datasets underlying it.
- Datasets that have been announced and registered become searchable in OSTI's databases, including Energy Citations Database, Information Bridge, and the DOE Data Explorer. Users of these databases are linked to the dataset at the data center or facility where it resides; this increases the opportunity for discovery of additional data, specialized interfaces, toolkits for data analysis, etc.
- Because OSTI is the operating agent for Science.gov and World Wide Science.org, datasets become searchable there also; and, due to the agreements OSTI has in place with commercial search engines such as Google, your data becomes visible to their users as well.
- DOI's make data easy to cite in a standardized way [DOIs have become recognizable as pointers to important information around the globe], encouraging authors to include this step in their writing/publishing activities.
- DOI's can always be "resolved" at DataCite. Anyone seeing the DOI in a print publication, for example, can access the DataCite homepage, type the number into the resolver tool, and find out what information that DOI is identifying, where the information is, and then go there with a click of the link. And, of course, online publications will list DOIs as live links when authors reference their own datasets or those of other scientists.
- Submitting sites can show active participation in DOE's Scientific and Technical Program (STIP) and awareness of DOE Order 241.1B through the data submission statistics that OSTI maintains.
How OSTI's Data ID Service Works
- A DOE researcher, organization, or grantee determines that important datasets exist which need to be announced in DOE's scientific and technical databases and assigned DOIs. DOE Order 241.1B instructs that bibliographic information for these datasets be submitted to OSTI. First time submitters may contact OSTI at 865-576-6784 for help in deciding what submittal method will be used, metadata requirements, etc.
- Submittal is handled through manual input into the Announcement Notice (AN) 241.6 or via an automated 241.6 web service/API. The AN 241.6 is available without login or through an E-Link account for DOE organizations. It is appropriate if the anticipated volume of datasets to be registered is low. Higher volumes are more easily handled through the automated API, but some upfront programming is required of the submitting site.
- The submitter decides at what level DOIs will be assigned to the data. Some datasets are similar to collections, in that they have multiple data files to which the landing page leads; others may be small and consist only of an Excel spreadsheet. Defining the boundaries of the datasets that will be announced and registered is an important step requiring subject expertise and knowledge of how particular audiences normally look for the data in question. For that reason, the definition of what constitutes a dataset that will receive a DOI is the responsibility of the people who know the data best, i.e. the people at the submitting organization.
- When announcing and registering a dataset, the submitter agrees to the following requirements:
- Ensures that the dataset is located in a data center or online repository where it will be managed in such a way to provide persistent access and maintenance of all URLs associated with the DOI.
- Provides, at a minimum, the mandatory metadata and ensures appropriate authority to make available the metadata and the dataset being identified. The dataset must be open and accessible to the public.
- Ensures that the URL assigned to the DOI links to a landing page (typically an HTML page) that provides users with the necessary context for using the data.
- Coordinates with OSTI to create and maintain a persistent "tombstone page" when data registered with a DOI must become unavailable.
- If the AN 241.6 will be manually completed and submitted, the steps for doing so are below. Go to #6 for using the automated 241.6 web service/API.
- Log in to E-Link with an existing account or request a new account at https://www.osti.gov/elink/register.jsp (AN 241.6 on E-Link) or simply begin entering metadata at https://www.osti.gov/elink/241-6.do?ostiid=0&action=load (non-login version).
- Ensure all required fields are completed with the appropriate information. Required fields are: Dataset Type, Dataset Title, Creator/PI name, Dataset Product Number, DOE Contract Number, Originating Research Organization, Publication/Issue Date, and the URL for the HTML landing page. Some minimal contact information is also required for administrative purposes. More detailed information is provided at https://www.osti.gov/elink/F2416instruct.jsp?printerfriendly=true
- Provide information for as many of the optional metadata fields as possible. A description of the dataset is optional but is highly encouraged because of its extreme importance for enhancing searchability.
- Use the buttons at the bottom of the AN 241.6 to submit the metadata to OSTI. Note that the dataset itself is not physically transmitted to OSTI, only the metadata that identifies and describes it. OSTI does not upload or host the actual dataset.
OSTI's processing of the AN 241.6 metadata now begins. Two unique numbers are assigned: (1) the OSTI ID that identifies the record in any of OSTI's databases and (2) the unique DOI that will identify the dataset and its location to the world. Both numbers are immediately available to the submitter through a printable confirmation page and an automatically-generated email. The format of the DOI will be a numerical string beginning with 10. and continuing with a / (slash mark) and a number known as the prefix, then another / (slash mark) and the OSTI ID. Example: 10.5439/1023895 or http://dx.doi.org/10.5439/1023895.
- If the submitter wishes to utilize OSTI's 241.6 Announcement Web Service/API, the steps for doing so are:
- Obtain account information from OSTI by calling 865-576-6784.
- Have an IT developer at your site write the software routine that will connect to OSTI's 241.6 web service/API. The developer will also write the code that will pull metadata from your database or location where it resides, will format it into correct XML file format, and will then transmit the file to OSTI's server as a Post operation. For more detailed information about the programming steps, including sample code and sample XML records, see the 241.6 STI Announcement Web Service Manual.
- Required and optional metadata fields are the same for all entry methods. See 5.c and 5.d in this document or find detailed information about the metadata in the manual referenced above.
- Coordinate with OSTI to request OSTI's test systems. Once the routines are working smoothly, OSTI sets up a production account for the submitter, and the metadata flows directly to E-Link, OSTI's processing system. The schedule is determined by the submitter. Metadata can be submitted once daily, several times a day, once a week, etc.
OSTI's processing of the metadata begins as soon as a transmission is received. Two unique numbers are assigned: (1) the OSTI ID that identifies the record in any of OSTI's databases and (2) the unique DOI that will identify the dataset and its location to the world. Both numbers are immediately available via the XML response that OSTI's server returns to the submitting server and through an automatically-generated email. The format of the DOI will be a numerical string beginning with 10. and continuing with a / (slash mark) and a number known as the prefix, then another / (slash mark) and the OSTI ID. Example: 10.5439/1023895 or http://dx.doi.org/10.5439/1023895.
As part of its processing, OSTI automatically connects to DataCite and uploads metadata and the new DOI to DataCite. DataCite now knows about your data and where it is. When both OSTI's and DataCite's processing steps are completed, the information about the dataset, along with the DOI that permanently links to the dataset's location, is disseminated to databases and to commercial search engines such as Google.
Some Useful Links
Announcement Notice 241.6 for Publicly Available Scientific Research Datasets (for manual submission)
STI Announcement Web Service for 241.6 Data (instructions for automated submission)
DataCite (includes a recommended bibliographic format for citing data in published literature)
Databib (a global bibliography of research data repositories)