Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A Guide to Using GitHub for Developing and Versioning Data Standards and Reporting Formats

Journal Article · · Earth and Space Science
DOI:https://doi.org/10.1029/2021EA001797· OSTI ID:1812383
 [1];  [1];  [2];  [3];  [1];  [4];  [5];  [1];  [5];  [6];  [7];  [5];  [4];  [1];  [2];  [1];  [6];  [1];  [5];  [1] more »;  [8];  [5];  [4] « less
  1. Earth and Environmental Sciences Area Lawrence Berkeley National Laboratory Berkeley CA USA
  2. Pacific Northwest National Laboratory Joint Global Change Research Institute at the University of Maryland–College Park College Park MD USA
  3. Geochemistry and Biogeochemistry Group SLAC National Accelerator Laboratory Menlo Park CA USA
  4. Computational Research Division Lawrence Berkeley National Laboratory Berkeley CA USA
  5. Division of Environmental Sciences Oak Ridge National Laboratory Oak Ridge TN USA
  6. Environmental and Climate Sciences Department Brookhaven National Laboratory Upton NY USA
  7. Pacific Northwest National Laboratory Richland WA USA
  8. Argonne National Laboratory Lemont IL USA

Abstract

Data standardization combined with descriptive metadata facilitate data reuse, which is the ultimate goal of the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Community data or metadata standards are increasingly created through an approach that emphasizes collaboration between various stakeholders. Such an approach requires platforms for collaboration on the development process that centers on sharing information and receiving feedback. Our objective in this study was to conduct a systematic review to identify data standards and reporting formats that use version control for developing data standards and to summarize common practices, particularly in earth and environmental sciences. Out of 108 data standards and reporting formats identified in our review, 32 used GitHub as the version control platform, and no other platforms were used. We found no universally accepted methodology for developing and publishing data standards. Many GitHub repositories did not use key features that could help developers to gather user feedback, or to create and revise standards that build on previous work. We provide guidance for community‐driven standard development and associated documentation on GitHub based on a systematic review of existing practices.

Sponsoring Organization:
USDOE
OSTI ID:
1812383
Alternate ID(s):
OSTI ID: 1808064
OSTI ID: 1811370
OSTI ID: 1812384
OSTI ID: 1823091
OSTI ID: 1825701
Journal Information:
Earth and Space Science, Journal Name: Earth and Space Science Journal Issue: 8 Vol. 8; ISSN 2333-5084
Publisher:
American Geophysical Union (AGU)Copyright Statement
Country of Publication:
United States
Language:
English

References (29)

SWEET ontology coverage for earth system sciences journal January 2014
A reporting format for leaf-level gas exchange data and metadata journal March 2021
A reporting format for field measurements of soil respiration journal May 2021
Open collaboration in the public sector: The case of social coding on GitHub journal October 2015
A topological analysis of communication channels for knowledge sharing in contemporary GitHub projects journal December 2019
Barely sufficient practices in scientific computing journal February 2021
Launching an Accessible Archive of Environmental Data journal January 2019
Democratic databases: science on GitHub journal October 2016
Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications journal May 2011
The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology journal June 2014
FAIRsharing as a community approach to standards, repositories and policies journal April 2019
The FAIR Guiding Principles for scientific data management and stewardship journal March 2016
Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub conference November 2013
ICAT: Integrating Data Infrastructure for Facilities Based Science
  • Flannery, Damian; Matthews, Brian; Griffin, Tom
  • 2009 5th IEEE International Conference on e-Science (e-Science), 2009 Fifth IEEE International Conference on e-Science https://doi.org/10.1109/e-Science.2009.36
conference December 2009
Towards an ecological trait‐data standard journal September 2019
Being Fair about the Design of FAIR Data Standards journal December 2020
New Knowledge from Old Data: The Role of Standards in the Sharing and Reuse of Ecological Data journal September 2008
The environment ontology: contextualising biological and biomedical entities journal January 2013
Design and implementation of microarray gene expression markup language (MAGE-ML) journal August 2002
Public Data Archiving in Ecology and Evolution: How Well Are We Doing? journal November 2015
Ten Simple Rules for Digital Data Storage journal October 2016
Ten simple rules for helping newcomers become contributors to open projects journal September 2019
Principles for data analysis workflows journal March 2021
Darwin Core: An Evolving Community-Developed Biodiversity Data Standard journal January 2012
Towards Standardization: A Participatory Framework for Scientific Standard-Making journal June 2013
Toward a new data standard for combined marine biological and environmental datasets - expanding OBIS beyond species occurrences journal January 2017
On the Reuse of Scientific Data journal March 2017
Sample Identifiers and Metadata to Support Data Management and Reuse in Multidisciplinary Ecosystem Sciences journal January 2021
Building an eScience Thesaurus for Librarians: A Collaboration Between the National Network of Libraries of Medicine, New England Region and an Associate Fellow at the National Library of Medicine journal November 2013

Similar Records

Related Subjects