Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gkac1049· OSTI ID:1898954
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Abstract

With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
National Institutes of Health (NIH); National Science Foundation (NSF); USDOE; USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1898954
Alternate ID(s):
OSTI ID: 1987511
Journal Information:
Nucleic Acids Research, Journal Name: Nucleic Acids Research Journal Issue: D1 Vol. 51; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (22)

Structural Biology of Nonribosomal Peptide Synthetases book February 2016
Comprehensive Analysis of Distinctive Polyketide and Nonribosomal Peptide Structural Motifs Encoded in Microbial Genomes journal May 2007
A Machine Learning Bioinformatics Method to Predict Biological Activity from Biosynthetic Gene Clusters journal May 2021
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules journal February 1988
Extended-Connectivity Fingerprints journal April 2010
Minimum Information about a Biosynthetic Gene cluster journal August 2015
Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences journal November 2020
Structures of a non-ribosomal peptide synthetase condensation domain suggest the basis of substrate selectivity journal May 2021
Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes journal May 2022
Biosynthetic potential of the global ocean microbiome journal June 2022
A genomic catalog of Earth’s microbiomes journal November 2020
SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria journal June 2017
PubChem in 2021: new data content and improved web interfaces journal November 2020
antiSMASH 6.0: improving cluster detection and comparison capabilities journal May 2021
The Natural Products Atlas 2.0: a database of microbially-derived natural products journal October 2021
NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity journal May 2011
The ChEMBL database in 2017 journal November 2016
plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters journal April 2017
RiPPMiner: a bioinformatics resource for deciphering chemical structures of RiPPs based on prediction of cleavage and cross-links journal May 2017
A deep learning genome-mining strategy for biosynthetic gene cluster prediction journal August 2019
PIKAChU: a Python-based informatics kit for analysing chemical units journal June 2022
The LOTUS initiative for open knowledge management in natural products research journal May 2022

Similar Records

MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration
Journal Article · Sun Dec 08 19:00:00 EST 2024 · Nucleic Acids Research · OSTI ID:2481018

Related Subjects