DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Predicting novel substrates for enzymes with minimal experimental effort with active learning

Abstract

Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. Here, to demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of similar to 80% using similar to 33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. Finally, the application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymaticmore » reactions and could prove a valuable resource for the design of novel metabolic pathways.« less

Authors:
 [1];  [1];  [2];  [1];  [1];  [1]
  1. Northwestern Univ., Evanston, IL (United States). Dept. of Chemical and Biological Engineering
  2. Northwestern Univ., Evanston, IL (United States). Dept. of Chemical and Biological Engineering; Argonne National Lab. (ANL), Argonne, IL (United States). Mathematics and Computer Science Division
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Org.:
National Science Foundation (NSF); National Institutes of Health (NIH); Bill and Melinda Gates Foundation; USDOE
OSTI Identifier:
1427497
Grant/Contract Number:  
AC02-06CH11357; T32-GM008449-23
Resource Type:
Accepted Manuscript
Journal Name:
Metabolic Engineering
Additional Journal Information:
Journal Volume: 44; Journal Issue: C; Journal ID: ISSN 1096-7176
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY; Active learning; Enzyme promiscuity; Machine learning

Citation Formats

Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, and Tyo, Keith E. J. Predicting novel substrates for enzymes with minimal experimental effort with active learning. United States: N. p., 2017. Web. doi:10.1016/j.ymben.2017.09.016.
Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, & Tyo, Keith E. J. Predicting novel substrates for enzymes with minimal experimental effort with active learning. United States. https://doi.org/10.1016/j.ymben.2017.09.016
Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, and Tyo, Keith E. J. Tue . "Predicting novel substrates for enzymes with minimal experimental effort with active learning". United States. https://doi.org/10.1016/j.ymben.2017.09.016. https://www.osti.gov/servlets/purl/1427497.
@article{osti_1427497,
title = {Predicting novel substrates for enzymes with minimal experimental effort with active learning},
author = {Pertusi, Dante A. and Moura, Matthew E. and Jeffryes, James G. and Prabhu, Siddhant and Walters Biggs, Bradley and Tyo, Keith E. J.},
abstractNote = {Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. Here, to demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of similar to 80% using similar to 33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. Finally, the application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymatic reactions and could prove a valuable resource for the design of novel metabolic pathways.},
doi = {10.1016/j.ymben.2017.09.016},
journal = {Metabolic Engineering},
number = C,
volume = 44,
place = {United States},
year = {Tue Oct 10 00:00:00 EDT 2017},
month = {Tue Oct 10 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 16 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Carboxylic acid reductase is a versatile enzyme for the conversion of fatty acids into fuels and chemical commodities
journal, December 2012

  • Akhtar, M. K.; Turner, N. J.; Jones, P. R.
  • Proceedings of the National Academy of Sciences, Vol. 110, Issue 1, p. 87-92
  • DOI: 10.1073/pnas.1216516110

Ligand-Based Target Prediction with Signature Fingerprints
journal, September 2014

  • Alvarsson, Jonathan; Eklund, Martin; Engkvist, Ola
  • Journal of Chemical Information and Modeling, Vol. 54, Issue 10
  • DOI: 10.1021/ci500361u

Orthogonal Assays Clarify the Oxidative Biochemistry of Taxol P450 CYP725A4
journal, March 2016

  • Biggs, Bradley Walters; Rouck, John Edward; Kambalyal, Amogh
  • ACS Chemical Biology, Vol. 11, Issue 5
  • DOI: 10.1021/acschembio.5b00968

Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path
journal, September 2014


Molecular signatures-based prediction of enzyme promiscuity
journal, June 2010


XTMS: pathway design in an eXTended metabolic space
journal, May 2014

  • Carbonell, Pablo; Parutto, Pierre; Herisson, Joan
  • Nucleic Acids Research, Vol. 42, Issue W1
  • DOI: 10.1093/nar/gku362

Prediction of novel synthetic pathways for the production of desired chemicals
journal, January 2010


The subtle benefits of being promiscuous: Adaptive evolution potentiated by enzyme promiscuity
journal, July 2007


Comparison of Confirmed Inactive and Randomly Selected Compounds as Negative Training Examples in Support Vector Machine-Based Virtual Screening
journal, July 2013

  • Heikamp, Kathrin; Bajorath, Jürgen
  • Journal of Chemical Information and Modeling, Vol. 53, Issue 7
  • DOI: 10.1021/ci4002712

Biocatalytic Promiscuity
journal, May 2011

  • Humble, Maria Svedendahl; Berglund, Per
  • European Journal of Organic Chemistry, Vol. 2011, Issue 19
  • DOI: 10.1002/ejoc.201001664

ZINC: A Free Tool to Discover Chemistry for Biology
journal, June 2012

  • Irwin, John J.; Sterling, Teague; Mysinger, Michael M.
  • Journal of Chemical Information and Modeling, Vol. 52, Issue 7
  • DOI: 10.1021/ci3001277

Protein-ligand interaction prediction: an improved chemogenomics approach
journal, August 2008


Data, information, knowledge and principle: back to metabolism in KEGG
journal, November 2013

  • Kanehisa, Minoru; Goto, Susumu; Sato, Yoko
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1076

Enzyme Promiscuity: A Mechanistic and Evolutionary Perspective
journal, June 2010


MenD as a versatile catalyst for asymmetric synthesis
journal, November 2009

  • Kurutsch, Anja; Richter, Michael; Brecht, Volker
  • Journal of Molecular Catalysis B: Enzymatic, Vol. 61, Issue 1-2
  • DOI: 10.1016/j.molcatb.2009.03.011

Systems metabolic engineering of microorganisms for natural and non-natural chemicals
journal, May 2012

  • Lee, Jeong Wook; Na, Dokyun; Park, Jong Myoung
  • Nature Chemical Biology, Vol. 8, Issue 6
  • DOI: 10.1038/nchembio.970

Metabolite damage and its repair or pre-emption
journal, January 2013

  • Linster, Carole L.; Van Schaftingen, Emile; Hanson, Andrew D.
  • Nature Chemical Biology, Vol. 9, Issue 2
  • DOI: 10.1038/nchembio.1141

The Purchasable Chemical Space: A Detailed Picture
journal, April 2015

  • Lucas, Xavier; Grüning, Björn A.; Bleher, Stefan
  • Journal of Chemical Information and Modeling, Vol. 55, Issue 5
  • DOI: 10.1021/acs.jcim.5b00116

Experimental Design Strategy: Weak Reinforcement Leads to Increased Hit Rates and Enhanced Chemical Diversity
journal, May 2015

  • Maciejewski, Mateusz; Wassermann, Anne Mai; Glick, Meir
  • Journal of Chemical Information and Modeling, Vol. 55, Issue 5
  • DOI: 10.1021/acs.jcim.5b00054

Probing the promiscuity of ent -kaurene oxidases via combinatorial biosynthesis
journal, February 2016

  • Mafu, Sibongile; Jia, Meirong; Zi, Jiachen
  • Proceedings of the National Academy of Sciences, Vol. 113, Issue 9
  • DOI: 10.1073/pnas.1512096113

Characterizing and predicting carboxylic acid reductase activity for diversifying bioaldehyde production: Carboxylic Acid Reductases for Bioaldehydes
journal, November 2015

  • Moura, Matthew; Pertusi, Dante; Lenzini, Stephen
  • Biotechnology and Bioengineering, Vol. 113, Issue 5
  • DOI: 10.1002/bit.25860

Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds
journal, April 2011


The Roles of Pteridine Reductase 1 and Dihydrofolate Reductase-Thymidylate Synthase in Pteridine Metabolism in the Protozoan Parasite Leishmania major
journal, May 1997

  • Nare, Bakela; Hardy, Larry W.; Beverley, Stephen M.
  • Journal of Biological Chemistry, Vol. 272, Issue 21
  • DOI: 10.1074/jbc.272.21.13883

Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit
journal, March 2008

  • O'Boyle, Noel M.; Morley, Chris; Hutchison, Geoffrey R.
  • Chemistry Central Journal, Vol. 2, Issue 1
  • DOI: 10.1186/1752-153X-2-5

Open Babel: An open chemical toolbox
journal, October 2011

  • O'Boyle, Noel M.; Banck, Michael; James, Craig A.
  • Journal of Cheminformatics, Vol. 3, Issue 1
  • DOI: 10.1186/1758-2946-3-33

Efficient searching and annotation of metabolic networks using chemical similarity
journal, December 2014


Cloning, Expression, Characterization, and Biocatalytic Investigation of the 4-Hydroxyacetophenone Monooxygenase from Pseudomonas putida JD1
journal, May 2009

  • Rehdorf, Jessica; Zimmer, Christian L.; Bornscheuer, Uwe T.
  • Applied and Environmental Microbiology, Vol. 75, Issue 10
  • DOI: 10.1128/AEM.02707-08

BRENDA, the enzyme information system in 2011
journal, November 2010

  • Scheer, M.; Grote, A.; Chang, A.
  • Nucleic Acids Research, Vol. 39, Issue Database
  • DOI: 10.1093/nar/gkq1089

BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA
journal, November 2012

  • Schomburg, Ida; Chang, Antje; Placzek, Sandra
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1049

The influence of the inactives subset generation on the performance of machine learning methods
journal, April 2013

  • Smusz, Sabina; Kurczab, Rafał; Bojarski, Andrzej J.
  • Journal of Cheminformatics, Vol. 5, Issue 1
  • DOI: 10.1186/1758-2946-5-17

Ligand-Based Models for the Isoform Specificity of Cytochrome P450 3A4, 2D6, and 2C9 Substrates
journal, July 2007

  • Terfloth, Lothar; Bienfait, Bruno; Gasteiger, Johann
  • Journal of Chemical Information and Modeling, Vol. 47, Issue 4
  • DOI: 10.1021/ci700010t

Metabolite proofreading, a neglected aspect of intermediary metabolism
journal, January 2013

  • Van Schaftingen, Emile; Rzem, Rim; Marbaix, Alexandre
  • Journal of Inherited Metabolic Disease, Vol. 36, Issue 3
  • DOI: 10.1007/s10545-012-9571-1

Reduction of Carboxylic Acids by Nocardia Aldehyde Oxidoreductase Requires a Phosphopantetheinylated Enzyme
journal, November 2006

  • Venkitasubramanian, Padmesh; Daniels, Lacy; Rosazza, John P. N.
  • Journal of Biological Chemistry, Vol. 282, Issue 1
  • DOI: 10.1074/jbc.M607980200

Aldehyde oxidoreductase as a biocatalyst: Reductions of vanillic acid
journal, January 2008


Molecular annotation of ketol‐acid reductoisomerases from S treptomyces reveals a novel amino acid biosynthesis interlock mediated by enzyme promiscuity
journal, October 2014

  • Verdel‐Aranda, Karina; López‐Cortina, Susana T.; Hodgson, David A.
  • Microbial Biotechnology, Vol. 8, Issue 2
  • DOI: 10.1111/1751-7915.12175

Target Fishing for Chemical Compounds Using Target-Ligand Activity Data and Ranking Based Methods
journal, September 2009

  • Wale, Nikil; Karypis, George
  • Journal of Chemical Information and Modeling, Vol. 49, Issue 10
  • DOI: 10.1021/ci9000376

Active Learning with Support Vector Machines in the Drug Discovery Process
journal, February 2003

  • Warmuth, Manfred K.; Liao, Jun; Rätsch, Gunnar
  • Journal of Chemical Information and Computer Sciences, Vol. 43, Issue 2
  • DOI: 10.1021/ci025620t

Similarity-based virtual screening using 2D fingerprints
journal, December 2006


Works referencing / citing this record:

Extended substrate range of thiamine diphosphate-dependent MenD enzyme by coupling of two C–C-bonding reactions
journal, July 2018

  • Schapfl, Matthias; Baier, Shiromi; Fries, Alexander
  • Applied Microbiology and Biotechnology, Vol. 102, Issue 19
  • DOI: 10.1007/s00253-018-9259-z

Identification of major malate export systems in an engineered malate-producing Escherichia coli aided by substrate similarity search
journal, October 2019

  • Kurgan, Gavin; Kurgan, Logan; Schneider, Aidan
  • Applied Microbiology and Biotechnology, Vol. 103, Issue 21-22
  • DOI: 10.1007/s00253-019-10164-y