PDBx/mmCIF Ecosystem: Foundational Semantic Tools for Structural Biology
Journal Article
·
· Journal of Molecular Biology
more »
- Rutgers Univ., Piscataway, NJ (United States); Institute for Quantitative Biomedicine, Piscataway, NJ (United States); Cancer Institute of New Jersey, New Brunswick, NJ (United States); OSTI
- Rutgers Univ., Piscataway, NJ (United States); Institute for Quantitative Biomedicine, Piscataway, NJ (United States)
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
- European Bioinformatics Institute (EMBL-EBI), Cambridge (United Kingdom). Protein Data Bank in Europe (PDBe)
- Global Phasing Ltd, Cambridge (United Kingdom)
- University of Konstanz (Germany)
- The Netherlands Cancer Institute, Amsterdam (Netherlands); Oncode Institute, Utrecht (Netherlands)
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Science and Technology Facilities Council (STFC), Oxford (United Kingdom). Rutherford Appleton Lab. (RAL)
- Osaka Univ. (Japan). Institute for Protein Research, Protein Data Bank Japan
- Rutgers Univ., Piscataway, NJ (United States); Univ. of Southern California, Los Angeles, CA (United States)
- Rutgers Univ., Piscataway, NJ (United States); Institute for Quantitative Biomedicine, Piscataway, NJ (United States); Cancer Institute of New Jersey, New Brunswick, NJ (United States); Univ. of California, La Jolla (United States). San Diego Supercomputer Center
PDBx/mmCIF, Protein Data Bank Exchange (PDBx) macromolecular Crystallographic Information Framework (mmCIF), has become the data standard for structural biology. With its early roots in the domain of small-molecule crystallography, PDBx/mmCIF provides an extensible data representation that is used for deposition, archiving, remediation, and public dissemination of experimentally determined three-dimensional (3D) structures of biological macromolecules by the Worldwide Protein Data Bank (wwPDB, wwpdb.org). Extensions of PDBx/mmCIF are similarly used for computed structure models by ModelArchive (modelarchive.org), integrative/hybrid structures by PDB-Dev (pdb-dev.wwpdb.org), small angle scattering data by Small Angle Scattering Biological Data Bank SASBDB (sasbdb.org), and for models computed generated with the AlphaFold 2.0 deep learning software suite (alphafold.ebi.ac.uk). Community-driven development of PDBx/mmCIF spans three decades, involving contributions from researchers, software and methods developers in structural sciences, data repository providers, scientific publishers, and professional societies. Having a semantically rich and extensible data framework for representing a wide range of structural biology experimental and computational results, combined with expertly curated 3D biostructure data sets in public repositories, accelerates the pace of scientific discovery. Herein, we describe the architecture of the PDBx/mmCIF data standard, tools used to maintain representations of the data standard, governance, and processes by which data content standards are extended, plus community tools/software libraries available for processing and checking the integrity of PDBx/mmCIF data. Use cases exemplify how the members of the Worldwide Protein Data Bank have used PDBx/mmCIF as the foundation for its pipeline for delivering Findable, Accessible, Interoperable, and Reusable (FAIR) data to many millions of users worldwide.
- Research Organization:
- Rutgers Univ., Piscataway, NJ (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- Grant/Contract Number:
- SC0019749
- OSTI ID:
- 1977324
- Journal Information:
- Journal of Molecular Biology, Journal Name: Journal of Molecular Biology Journal Issue: 11 Vol. 434; ISSN 0022-2836
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English