Predicting novel substrates for enzymes with minimal experimental effort with active learning
Abstract
Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. Here, to demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of similar to 80% using similar to 33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. Finally, the application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymaticmore »
- Authors:
-
- Northwestern Univ., Evanston, IL (United States). Dept. of Chemical and Biological Engineering
- Northwestern Univ., Evanston, IL (United States). Dept. of Chemical and Biological Engineering; Argonne National Lab. (ANL), Argonne, IL (United States). Mathematics and Computer Science Division
- Publication Date:
- Research Org.:
- Argonne National Laboratory (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- National Science Foundation (NSF); National Institutes of Health (NIH); Bill and Melinda Gates Foundation; USDOE
- OSTI Identifier:
- 1427497
- Grant/Contract Number:
- AC02-06CH11357; T32-GM008449-23
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Metabolic Engineering
- Additional Journal Information:
- Journal Volume: 44; Journal Issue: C; Journal ID: ISSN 1096-7176
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; 37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY; Active learning; Enzyme promiscuity; Machine learning
Citation Formats
Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, and Tyo, Keith E. J. Predicting novel substrates for enzymes with minimal experimental effort with active learning. United States: N. p., 2017.
Web. doi:10.1016/j.ymben.2017.09.016.
Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, & Tyo, Keith E. J. Predicting novel substrates for enzymes with minimal experimental effort with active learning. United States. https://doi.org/10.1016/j.ymben.2017.09.016
Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, and Tyo, Keith E. J. Tue .
"Predicting novel substrates for enzymes with minimal experimental effort with active learning". United States. https://doi.org/10.1016/j.ymben.2017.09.016. https://www.osti.gov/servlets/purl/1427497.
@article{osti_1427497,
title = {Predicting novel substrates for enzymes with minimal experimental effort with active learning},
author = {Pertusi, Dante A. and Moura, Matthew E. and Jeffryes, James G. and Prabhu, Siddhant and Walters Biggs, Bradley and Tyo, Keith E. J.},
abstractNote = {Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. Here, to demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of similar to 80% using similar to 33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. Finally, the application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymatic reactions and could prove a valuable resource for the design of novel metabolic pathways.},
doi = {10.1016/j.ymben.2017.09.016},
journal = {Metabolic Engineering},
number = C,
volume = 44,
place = {United States},
year = {Tue Oct 10 00:00:00 EDT 2017},
month = {Tue Oct 10 00:00:00 EDT 2017}
}
Web of Science
Works referenced in this record:
Carboxylic acid reductase is a versatile enzyme for the conversion of fatty acids into fuels and chemical commodities
journal, December 2012
- Akhtar, M. K.; Turner, N. J.; Jones, P. R.
- Proceedings of the National Academy of Sciences, Vol. 110, Issue 1, p. 87-92
Ligand-Based Target Prediction with Signature Fingerprints
journal, September 2014
- Alvarsson, Jonathan; Eklund, Martin; Engkvist, Ola
- Journal of Chemical Information and Modeling, Vol. 54, Issue 10
Orthogonal Assays Clarify the Oxidative Biochemistry of Taxol P450 CYP725A4
journal, March 2016
- Biggs, Bradley Walters; Rouck, John Edward; Kambalyal, Amogh
- ACS Chemical Biology, Vol. 11, Issue 5
Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path
journal, September 2014
- Campodonico, Miguel A.; Andrews, Barbara A.; Asenjo, Juan A.
- Metabolic Engineering, Vol. 25
Molecular signatures-based prediction of enzyme promiscuity
journal, June 2010
- Carbonell, Pablo; Faulon, Jean-Loup
- Bioinformatics, Vol. 26, Issue 16
XTMS: pathway design in an eXTended metabolic space
journal, May 2014
- Carbonell, Pablo; Parutto, Pierre; Herisson, Joan
- Nucleic Acids Research, Vol. 42, Issue W1
Prediction of novel synthetic pathways for the production of desired chemicals
journal, January 2010
- Cho, Ayoun; Yun, Hongseok; Park, Jin
- BMC Systems Biology, Vol. 4, Issue 1
The subtle benefits of being promiscuous: Adaptive evolution potentiated by enzyme promiscuity
journal, July 2007
- DePristo, Mark A.
- HFSP Journal, Vol. 1, Issue 2
Comparison of Confirmed Inactive and Randomly Selected Compounds as Negative Training Examples in Support Vector Machine-Based Virtual Screening
journal, July 2013
- Heikamp, Kathrin; Bajorath, Jürgen
- Journal of Chemical Information and Modeling, Vol. 53, Issue 7
Biocatalytic Promiscuity
journal, May 2011
- Humble, Maria Svedendahl; Berglund, Per
- European Journal of Organic Chemistry, Vol. 2011, Issue 19
ZINC: A Free Tool to Discover Chemistry for Biology
journal, June 2012
- Irwin, John J.; Sterling, Teague; Mysinger, Michael M.
- Journal of Chemical Information and Modeling, Vol. 52, Issue 7
Protein-ligand interaction prediction: an improved chemogenomics approach
journal, August 2008
- Jacob, Laurent; Vert, Jean-Philippe
- Bioinformatics, Vol. 24, Issue 19
Data, information, knowledge and principle: back to metabolism in KEGG
journal, November 2013
- Kanehisa, Minoru; Goto, Susumu; Sato, Yoko
- Nucleic Acids Research, Vol. 42, Issue D1
Enzyme Promiscuity: A Mechanistic and Evolutionary Perspective
journal, June 2010
- Khersonsky, Olga; Tawfik, Dan S.
- Annual Review of Biochemistry, Vol. 79, Issue 1, p. 471-505
MenD as a versatile catalyst for asymmetric synthesis
journal, November 2009
- Kurutsch, Anja; Richter, Michael; Brecht, Volker
- Journal of Molecular Catalysis B: Enzymatic, Vol. 61, Issue 1-2
Systems metabolic engineering of microorganisms for natural and non-natural chemicals
journal, May 2012
- Lee, Jeong Wook; Na, Dokyun; Park, Jong Myoung
- Nature Chemical Biology, Vol. 8, Issue 6
Metabolite damage and its repair or pre-emption
journal, January 2013
- Linster, Carole L.; Van Schaftingen, Emile; Hanson, Andrew D.
- Nature Chemical Biology, Vol. 9, Issue 2
The Purchasable Chemical Space: A Detailed Picture
journal, April 2015
- Lucas, Xavier; Grüning, Björn A.; Bleher, Stefan
- Journal of Chemical Information and Modeling, Vol. 55, Issue 5
Experimental Design Strategy: Weak Reinforcement Leads to Increased Hit Rates and Enhanced Chemical Diversity
journal, May 2015
- Maciejewski, Mateusz; Wassermann, Anne Mai; Glick, Meir
- Journal of Chemical Information and Modeling, Vol. 55, Issue 5
Probing the promiscuity of ent -kaurene oxidases via combinatorial biosynthesis
journal, February 2016
- Mafu, Sibongile; Jia, Meirong; Zi, Jiachen
- Proceedings of the National Academy of Sciences, Vol. 113, Issue 9
Characterizing and predicting carboxylic acid reductase activity for diversifying bioaldehyde production: Carboxylic Acid Reductases for Bioaldehydes
journal, November 2015
- Moura, Matthew; Pertusi, Dante; Lenzini, Stephen
- Biotechnology and Bioengineering, Vol. 113, Issue 5
Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds
journal, April 2011
- Mu, Fangping; Unkefer, Clifford J.; Unkefer, Pat J.
- Bioinformatics, Vol. 27, Issue 11
The Roles of Pteridine Reductase 1 and Dihydrofolate Reductase-Thymidylate Synthase in Pteridine Metabolism in the Protozoan Parasite Leishmania major
journal, May 1997
- Nare, Bakela; Hardy, Larry W.; Beverley, Stephen M.
- Journal of Biological Chemistry, Vol. 272, Issue 21
Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit
journal, March 2008
- O'Boyle, Noel M.; Morley, Chris; Hutchison, Geoffrey R.
- Chemistry Central Journal, Vol. 2, Issue 1
Open Babel: An open chemical toolbox
journal, October 2011
- O'Boyle, Noel M.; Banck, Michael; James, Craig A.
- Journal of Cheminformatics, Vol. 3, Issue 1
Efficient searching and annotation of metabolic networks using chemical similarity
journal, December 2014
- Pertusi, Dante A.; Stine, Andrew E.; Broadbelt, Linda J.
- Bioinformatics, Vol. 31, Issue 7
Cloning, Expression, Characterization, and Biocatalytic Investigation of the 4-Hydroxyacetophenone Monooxygenase from Pseudomonas putida JD1
journal, May 2009
- Rehdorf, Jessica; Zimmer, Christian L.; Bornscheuer, Uwe T.
- Applied and Environmental Microbiology, Vol. 75, Issue 10
BRENDA, the enzyme information system in 2011
journal, November 2010
- Scheer, M.; Grote, A.; Chang, A.
- Nucleic Acids Research, Vol. 39, Issue Database
BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA
journal, November 2012
- Schomburg, Ida; Chang, Antje; Placzek, Sandra
- Nucleic Acids Research, Vol. 41, Issue D1
The influence of the inactives subset generation on the performance of machine learning methods
journal, April 2013
- Smusz, Sabina; Kurczab, Rafał; Bojarski, Andrzej J.
- Journal of Cheminformatics, Vol. 5, Issue 1
Ligand-Based Models for the Isoform Specificity of Cytochrome P450 3A4, 2D6, and 2C9 Substrates
journal, July 2007
- Terfloth, Lothar; Bienfait, Bruno; Gasteiger, Johann
- Journal of Chemical Information and Modeling, Vol. 47, Issue 4
Metabolite proofreading, a neglected aspect of intermediary metabolism
journal, January 2013
- Van Schaftingen, Emile; Rzem, Rim; Marbaix, Alexandre
- Journal of Inherited Metabolic Disease, Vol. 36, Issue 3
Reduction of Carboxylic Acids by Nocardia Aldehyde Oxidoreductase Requires a Phosphopantetheinylated Enzyme
journal, November 2006
- Venkitasubramanian, Padmesh; Daniels, Lacy; Rosazza, John P. N.
- Journal of Biological Chemistry, Vol. 282, Issue 1
Aldehyde oxidoreductase as a biocatalyst: Reductions of vanillic acid
journal, January 2008
- Venkitasubramanian, Padmesh; Daniels, Lacy; Das, Shuvendu
- Enzyme and Microbial Technology, Vol. 42, Issue 2
Molecular annotation of ketol‐acid reductoisomerases from S treptomyces reveals a novel amino acid biosynthesis interlock mediated by enzyme promiscuity
journal, October 2014
- Verdel‐Aranda, Karina; López‐Cortina, Susana T.; Hodgson, David A.
- Microbial Biotechnology, Vol. 8, Issue 2
Target Fishing for Chemical Compounds Using Target-Ligand Activity Data and Ranking Based Methods
journal, September 2009
- Wale, Nikil; Karypis, George
- Journal of Chemical Information and Modeling, Vol. 49, Issue 10
Active Learning with Support Vector Machines in the Drug Discovery Process
journal, February 2003
- Warmuth, Manfred K.; Liao, Jun; Rätsch, Gunnar
- Journal of Chemical Information and Computer Sciences, Vol. 43, Issue 2
Similarity-based virtual screening using 2D fingerprints
journal, December 2006
- Willett, Peter
- Drug Discovery Today, Vol. 11, Issue 23-24
Works referencing / citing this record:
Extended substrate range of thiamine diphosphate-dependent MenD enzyme by coupling of two C–C-bonding reactions
journal, July 2018
- Schapfl, Matthias; Baier, Shiromi; Fries, Alexander
- Applied Microbiology and Biotechnology, Vol. 102, Issue 19
Identification of major malate export systems in an engineered malate-producing Escherichia coli aided by substrate similarity search
journal, October 2019
- Kurgan, Gavin; Kurgan, Logan; Schneider, Aidan
- Applied Microbiology and Biotechnology, Vol. 103, Issue 21-22