skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes

Journal Article · · Bioinformatics

Abstract SummaryMicrobial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.

Sponsoring Organization:
USDOE
Grant/Contract Number:
DE SC0021340
OSTI ID:
1987739
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Vol. 39 Journal Issue: Supplement_1; ISSN 1367-4803
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (18)

NRPquest: Coupling Mass Spectrometry and Genome Mining for Nonribosomal Peptide Discovery journal August 2014
Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs) journal October 2005
antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline journal April 2019
AAindex: amino acid index database, progress report 2008 journal December 2007
Highly accurate protein structure prediction with AlphaFold journal July 2021
Automated genome mining for natural products journal January 2009
SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria journal June 2017
Extremely randomized trees journal March 2006
Structural Biology of Nonribosomal Peptide Synthetases book February 2016
NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity journal May 2011
The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases journal August 1999
Template-based protein structure modeling using the RaptorX web server journal July 2012
RaptorX-Property: a web server for protein structure property prediction journal April 2016
Structural basis for the activation of phenylalanine in the non-ribosomal biosynthesis of gramicidin S journal July 1997
A new Generation of Homology Search Tools Based on Probabilistic Inference conference March 2012
Integrating genomics and metabolomics for scalable non-ribosomal peptide discovery journal May 2021
Principal property values for six non-natural amino acids and their application to a structure–activity relationship for oxytocin peptide analogues journal August 1987
Nonribosomal peptides synthetases and their applications in industry journal August 2016