Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

PDBx/mmCIF Ecosystem: Foundational Semantic Tools for Structural Biology

Journal Article · · Journal of Molecular Biology
 [1];  [2];  [2];  [2];  [2];  [2];  [2];  [3];  [4];  [5];  [6];  [7];  [5];  [8];  [8];  [4];  [5];  [9];  [10];  [11] more »;  [12];  [2] « less
  1. Rutgers Univ., Piscataway, NJ (United States); Institute for Quantitative Biomedicine, Piscataway, NJ (United States); Cancer Institute of New Jersey, New Brunswick, NJ (United States); OSTI
  2. Rutgers Univ., Piscataway, NJ (United States); Institute for Quantitative Biomedicine, Piscataway, NJ (United States)
  3. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
  4. European Bioinformatics Institute (EMBL-EBI), Cambridge (United Kingdom). Protein Data Bank in Europe (PDBe)
  5. Global Phasing Ltd, Cambridge (United Kingdom)
  6. University of Konstanz (Germany)
  7. The Netherlands Cancer Institute, Amsterdam (Netherlands); Oncode Institute, Utrecht (Netherlands)
  8. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
  9. Science and Technology Facilities Council (STFC), Oxford (United Kingdom). Rutherford Appleton Lab. (RAL)
  10. Osaka Univ. (Japan). Institute for Protein Research, Protein Data Bank Japan
  11. Rutgers Univ., Piscataway, NJ (United States); Univ. of Southern California, Los Angeles, CA (United States)
  12. Rutgers Univ., Piscataway, NJ (United States); Institute for Quantitative Biomedicine, Piscataway, NJ (United States); Cancer Institute of New Jersey, New Brunswick, NJ (United States); Univ. of California, La Jolla (United States). San Diego Supercomputer Center
PDBx/mmCIF, Protein Data Bank Exchange (PDBx) macromolecular Crystallographic Information Framework (mmCIF), has become the data standard for structural biology. With its early roots in the domain of small-molecule crystallography, PDBx/mmCIF provides an extensible data representation that is used for deposition, archiving, remediation, and public dissemination of experimentally determined three-dimensional (3D) structures of biological macromolecules by the Worldwide Protein Data Bank (wwPDB, wwpdb.org). Extensions of PDBx/mmCIF are similarly used for computed structure models by ModelArchive (modelarchive.org), integrative/hybrid structures by PDB-Dev (pdb-dev.wwpdb.org), small angle scattering data by Small Angle Scattering Biological Data Bank SASBDB (sasbdb.org), and for models computed generated with the AlphaFold 2.0 deep learning software suite (alphafold.ebi.ac.uk). Community-driven development of PDBx/mmCIF spans three decades, involving contributions from researchers, software and methods developers in structural sciences, data repository providers, scientific publishers, and professional societies. Having a semantically rich and extensible data framework for representing a wide range of structural biology experimental and computational results, combined with expertly curated 3D biostructure data sets in public repositories, accelerates the pace of scientific discovery. Herein, we describe the architecture of the PDBx/mmCIF data standard, tools used to maintain representations of the data standard, governance, and processes by which data content standards are extended, plus community tools/software libraries available for processing and checking the integrity of PDBx/mmCIF data. Use cases exemplify how the members of the Worldwide Protein Data Bank have used PDBx/mmCIF as the foundation for its pipeline for delivering Findable, Accessible, Interoperable, and Reusable (FAIR) data to many millions of users worldwide.
Research Organization:
Rutgers Univ., Piscataway, NJ (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
SC0019749
OSTI ID:
1977324
Journal Information:
Journal of Molecular Biology, Journal Name: Journal of Molecular Biology Journal Issue: 11 Vol. 434; ISSN 0022-2836
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (46)

An open graph visualization system and its applications to software engineering journal January 2000
Improving the representation of peptide-like inhibitor and antibiotic molecules in the Protein Data Bank: Representation of Peptide-Like Inhibitor and Antibiotic Molecules journal March 2014
UCSF Chimera?A visualization system for exploratory research and analysis journal January 2004
Protein Data Bank Japan: Celebrating our 20th anniversary during a global pandemic as the Asian hub of three dimensional macromolecular structural data journal October 2021
RCSB Protein Data Bank: Celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D journal November 2021
The Protein Data Bank archive as an open data resource journal July 2014
VMD: Visual molecular dynamics journal February 1996
Outcome of a Workshop on Applications of Protein Models in Biomedical Research journal February 2009
OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive journal March 2017
Development of a Prototype System for Archiving Integrative/Hybrid Structure Models of Biological Macromolecules journal June 2018
Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics journal May 2013
Announcing the worldwide Protein Data Bank journal December 2003
Highly accurate protein structure prediction with AlphaFold journal July 2021
The FAIR Guiding Principles for scientific data management and stewardship journal March 2016
PDBML: the representation of archival macromolecular structure data in XML journal October 2004
Biopython: freely available Python tools for computational molecular biology and bioinformatics journal March 2009
The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank journal December 2014
Modernized uniform representation of carbohydrate molecules in the Protein Data Bank journal May 2021
Biological roles of glycans journal August 2016
RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences journal November 2020
Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures journal May 2021
BioMagResBank journal December 2007
SASBDB, a repository for biological small-angle scattering data journal October 2014
EMDataBank unified data resource for 3DEM journal November 2015
Protein Data Bank: the single global archive for 3D macromolecular structure data journal October 2018
The Computational Crystallography Toolbox : crystallographic algorithms in a reusable software framework journal January 2002
VCIF2: extended CIF validation software journal June 2008
A short history of SHELX journal December 2007
The crystallographic information file (CIF): a new standard archive file for crystallography journal November 1991
The new CCP 4 Coordinate Library as a toolkit for the design of coordinate-related applications in protein crystallography journal November 2004
Structure validation in chemical crystallography journal January 2009
XDS journal January 2010
Integration, scaling, space-group assignment and post-refinement journal January 2010
Features and development of Coot journal March 2010
Overview of the CCP 4 suite and current developments journal March 2011
PDB_REDO : constructive validation, more than just looking for errors journal March 2012
COD::CIF::Parser : an error-correcting CIF parser for the Perl language journal February 2016
DIALS : implementation and evaluation of a new integration package journal February 2018
Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB) journal April 2019
Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix journal October 2019
The Resolution Revolution journal March 2014
Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A journal May 1991
The 3.8 A resolution cryo-EM structure of Zika virus journal March 2016
MMTF—An efficient file format for the transmission, visualization, and analysis of macromolecular structures journal June 2017
BioJava 5: A community driven open-source bioinformatics library journal February 2019
BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management journal October 2020