skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Predicting novel substrates for enzymes with minimal experimental effort with active learning

Abstract

Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. Here, to demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of similar to 80% using similar to 33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. Finally, the application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymaticmore » reactions and could prove a valuable resource for the design of novel metabolic pathways.« less

Authors:
 [1];  [1];  [2];  [1];  [1];  [1]
  1. Northwestern Univ., Evanston, IL (United States). Dept. of Chemical and Biological Engineering
  2. Northwestern Univ., Evanston, IL (United States). Dept. of Chemical and Biological Engineering; Argonne National Lab. (ANL), Argonne, IL (United States). Mathematics and Computer Science Division
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
National Science Foundation (NSF); National Institutes of Health (NIH); Bill and Melinda Gates Foundation; USDOE
OSTI Identifier:
1427497
Grant/Contract Number:  
AC02-06CH11357; T32-GM008449-23
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Metabolic Engineering
Additional Journal Information:
Journal Volume: 44; Journal Issue: C; Journal ID: ISSN 1096-7176
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY; Active learning; Enzyme promiscuity; Machine learning

Citation Formats

Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, and Tyo, Keith E. J. Predicting novel substrates for enzymes with minimal experimental effort with active learning. United States: N. p., 2017. Web. doi:10.1016/j.ymben.2017.09.016.
Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, & Tyo, Keith E. J. Predicting novel substrates for enzymes with minimal experimental effort with active learning. United States. doi:10.1016/j.ymben.2017.09.016.
Pertusi, Dante A., Moura, Matthew E., Jeffryes, James G., Prabhu, Siddhant, Walters Biggs, Bradley, and Tyo, Keith E. J. Tue . "Predicting novel substrates for enzymes with minimal experimental effort with active learning". United States. doi:10.1016/j.ymben.2017.09.016. https://www.osti.gov/servlets/purl/1427497.
@article{osti_1427497,
title = {Predicting novel substrates for enzymes with minimal experimental effort with active learning},
author = {Pertusi, Dante A. and Moura, Matthew E. and Jeffryes, James G. and Prabhu, Siddhant and Walters Biggs, Bradley and Tyo, Keith E. J.},
abstractNote = {Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. Here, to demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of similar to 80% using similar to 33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. Finally, the application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymatic reactions and could prove a valuable resource for the design of novel metabolic pathways.},
doi = {10.1016/j.ymben.2017.09.016},
journal = {Metabolic Engineering},
number = C,
volume = 44,
place = {United States},
year = {Tue Oct 10 00:00:00 EDT 2017},
month = {Tue Oct 10 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: