skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

Journal Article · · PLoS Biology (Online)
ORCiD logo [1];  [2];  [3];  [2];  [1];  [2];  [2];  [4];  [5];  [6];  [7];  [8];  [9];  [10];  [11];  [2];  [12];  [2];  [2];  [13] more »;  [2];  [10];  [2];  [2];  [2];  [14];  [15];  [16];  [7];  [7];  [17];  [18];  [6];  [6];  [19];  [20];  [6];  [2];  [1];  [21];  [6];  [20];  [1];  [2] « less
  1. Oregon Health & Science Univ., Portland, OR (United States). Dept. of Medical Informatics and Epidemiology
  2. European Molecular Biology Lab., Hinxton (United Kingdom). European Bioinformatics Inst.
  3. Wellcome Genome Campus, Hinxton (United Kingdom). ELIXIR Hub
  4. Univ. of California, Berkeley, CA (United States). Berkeley Natural History Museums
  5. Maastricht Univ. (Netherlands). Inst. of Data Science
  6. Univ. of Manchester (United Kingdom). School of Computer Science
  7. Univ. of Oxford (United Kingdom). Oxford e-Research Centre
  8. German Research Center for Environmental Health (Helmholtz Centre Munich) (Germany). Inst. of Experimental Genetics
  9. Univ. of California, San Diego, CA (United States). Center for Research in Biological Systems
  10. Babraham Inst., Cambridge (United Kingdom)
  11. European Molecular Biology Lab., Heidelberg (Germany)
  12. Technical Univ. of Denmark, Lyngby (Denmark). Center for Biological Sequence Analysis. Dept. of Systems Biology
  13. California Digital Library, Oakland, CA (United States)
  14. Daresbury Lab., Warrington (United Kingdom). Science and Technology Facilities Council
  15. Univ. of Groningen (Netherlands). Genomics Coordination Center. Dept. of Genetics. Univ. Medical Center Groningen. Groningen Bioinformatics Center
  16. Heidelberg Inst. for Theoretical Studies (Germany). Scientific Databases and Visualization
  17. Bern Univ. of Applied Sciences (Switzerland). Inst. for Medical Informatics. Engineering and Information Technology
  18. Univ. of Manchester (United Kingdom). Manchester Inst. of Biology; Stellenbosch Univ. (South Africa). Dept. of Biochemistry
  19. Univ. of Manchester (United Kingdom). Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals
  20. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Environmental Genomics and Systems Biology
  21. Leiden Univ. (Netherlands). Leiden Inst. of Advanced Computer Science

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES); National Inst. of Health (NIH) (United States); European Commission (EC); Biotechnology and Biological Sciences Research Council (BBSRC) (United Kingdom)
Grant/Contract Number:
AC02-05CH11231; R24OD011883; U41HG007822; U24AI117966; U54AI117925; 675728; 312455; 654248; 601043; BB/L005069/1; BB/M013189/1; BB/K019783/1; BBS/E/B/000C0419; BB/M006891/1; BB/M017702/1; BB/L005050/1
OSTI ID:
1408440
Journal Information:
PLoS Biology (Online), Vol. 15, Issue 6; ISSN 1545-7885
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 54 works
Citation information provided by
Web of Science

References (16)

Research resources: curating the new eagle-i discovery system journal January 2012
Data publication consensus and controversies journal January 2014
Identifiers.org and MIRIAM Registry: community resources to provide persistent identification journal December 2011
Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data journal April 2015
The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration journal November 2007
Gene name errors are widespread in the scientific literature journal August 2016
Achieving human and machine accessibility of cited data in scholarly publications journal January 2015
BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications journal June 2011
Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations journal June 2014
Against Dataism and for Data Sharing of Big Biomedical and Clinical Data with Research Parasites journal August 2016
The Resource Identification Initiative: A cultural shift in publishing journal January 2015
How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers journal August 2014
Persistent Identifiers for Scholarly Assets and the Web: The Need for an Unambiguous Mapping journal June 2014
Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot journal December 2014
Fair-Tlc: Metrics To Assess Value Of Biomedical Digital Repositories: Response To Rfi Not-Od-16-133 text January 2016
Uniform resolution of compact identifiers for biomedical data journal May 2018

Cited By (38)

Use of globally unique identifiers (GUIDs) to link herbarium specimen records to physical specimens journal February 2018
Grounding for an Enterprise Computing Nomenclature Ontology
  • Partridge, Chris; Mitchell, Andrew; de Cesare, Sergio
  • Conceptual Modeling: 38th International Conference, ER 2019, Salvador, Brazil, November 4–7, 2019, Proceedings, p. 457-465 https://doi.org/10.1007/978-3-030-33223-5_38
book October 2019
The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation journal May 2018
A Standard for the Scholarly Citation of Archaeological Data as an Incentive to Data Sharing journal April 2018
Quantifying the impact of public omics data journal August 2019
A scoping review of ontologies related to human behaviour change journal January 2019
FAIRsharing as a community approach to standards, repositories and policies journal April 2019
Uniform resolution of compact identifiers for biomedical data journal May 2018
Progress in single-access information systems for wheat and rice crop improvement journal April 2018
AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture journal January 2018
The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY journal November 2017
Unique, Persistent, Resolvable: Identifiers as the Foundation of FAIR journal January 2020
FAIR Principles: Interpretations and Implementation Considerations journal January 2020
Best practice data life cycle approaches for the life sciences journal January 2017
A FAIR guide for data providers to maximise sharing of human genomic data journal March 2018
Eleven quick tips to build a usable REST API for life sciences journal December 2018
Ten quick tips for biocuration journal May 2019
Research applications of primary biodiversity databases in the digital age journal September 2019
Identifying PIDs playing FAIR journal November 2019
Data Sharing: Convert Challenges into Opportunities journal December 2017
Making Canonical Workflow Building Blocks interoperable across workflow languages text January 2021
A Standard for the Scholarly Citation of Archaeological Data as an Incentive to Data Sharing text January 2017
The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY. journalarticle January 2018
Unique, Persistent, Resolvable: Identifiers as the foundation of FAIR text January 2019
Making Canonical Workflow Building Blocks interoperable across workflow languages text January 2021
Packaging research artefacts with RO-Crate text January 2021
Best practice data life cycle approaches for the life sciences journal January 2017
A data citation roadmap for scholarly data repositories journal April 2019
Biocuration: Distilling data into knowledge. journalarticle January 2018
Making Canonical Workflow Building Blocks Interoperable across Workflow Languages journal January 2022
Advanced infrastructure for PIDs in Photon and Neutron RIs text January 2022
Packaging research artefacts with RO-Crate journalarticle January 2022
Packaging research artefacts with RO-Crate text January 2021
Advanced infrastructure for PIDs in Photon and Neutron RIs text January 2022
A new guide to immunopharmacology journal October 2018
Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase journal January 2020
Prevention of data duplication for high throughput sequencing repositories journal January 2018
M3.1 - Joint value proposition by relevant PID providers text January 2023