DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning framework for assessment of microbial factory performance

Abstract

Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonablymore » high cross-validation accuracy for prediction of E.coli factory performance metrics under presumed bioprocess and pathway conditions (Pearson correlation coefficients between 0.8 and 0.93 on new data not seen by the model).« less

Authors:
ORCiD logo [1];  [1]; ORCiD logo [2];  [1]
  1. Washington Univ., St. Louis, MO (United States). Dept. of Energy, Environmental and Chemical Engineering
  2. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); DOE Agile BioFoundry, Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Systems and Engineering Division; Basque Center for Applied Mathematics, Bilbao (Spain)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Washington Univ., St. Louis, MO (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER); National Science Foundation (NSF)
OSTI Identifier:
1510764
Grant/Contract Number:  
AC02-05CH11231; SC0018324; MCB 1616619
Resource Type:
Accepted Manuscript
Journal Name:
PLoS ONE
Additional Journal Information:
Journal Volume: 14; Journal Issue: 1; Journal ID: ISSN 1932-6203
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; machine learning; metabolic engineering; genetic engineering; principal component analysis; genomic databases; cell metabolism; fermentation; simulation and modeling

Citation Formats

Oyetunde, Tolutola, Liu, Di, Martin, Hector Garcia, and Tang, Yinjie J. Machine learning framework for assessment of microbial factory performance. United States: N. p., 2019. Web. doi:10.1371/journal.pone.0210558.
Oyetunde, Tolutola, Liu, Di, Martin, Hector Garcia, & Tang, Yinjie J. Machine learning framework for assessment of microbial factory performance. United States. https://doi.org/10.1371/journal.pone.0210558
Oyetunde, Tolutola, Liu, Di, Martin, Hector Garcia, and Tang, Yinjie J. Tue . "Machine learning framework for assessment of microbial factory performance". United States. https://doi.org/10.1371/journal.pone.0210558. https://www.osti.gov/servlets/purl/1510764.
@article{osti_1510764,
title = {Machine learning framework for assessment of microbial factory performance},
author = {Oyetunde, Tolutola and Liu, Di and Martin, Hector Garcia and Tang, Yinjie J.},
abstractNote = {Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonably high cross-validation accuracy for prediction of E.coli factory performance metrics under presumed bioprocess and pathway conditions (Pearson correlation coefficients between 0.8 and 0.93 on new data not seen by the model).},
doi = {10.1371/journal.pone.0210558},
journal = {PLoS ONE},
number = 1,
volume = 14,
place = {United States},
year = {Tue Jan 15 00:00:00 EST 2019},
month = {Tue Jan 15 00:00:00 EST 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 23 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Computational methods in metabolic engineering for strain design
journal, August 2015


An ancient Chinese wisdom for metabolic engineering: Yin-Yang
journal, March 2015


Redesigning Escherichia coli Metabolism for Anaerobic Production of Isobutanol
journal, June 2011

  • Trinh, Cong T.; Li, Johnny; Blanch, Harvey W.
  • Applied and Environmental Microbiology, Vol. 77, Issue 14
  • DOI: 10.1128/AEM.00382-11

The LASER database: Formalizing design rules for metabolic engineering
journal, December 2015

  • Winkler, James D.; Halweg-Edwards, Andrea L.; Gill, Ryan T.
  • Metabolic Engineering Communications, Vol. 2
  • DOI: 10.1016/j.meteno.2015.06.003

KBase: The United States Department of Energy Systems Biology Knowledgebase
journal, July 2018

  • Arkin, Adam P.; Cottingham, Robert W.; Henry, Christopher S.
  • Nature Biotechnology, Vol. 36, Issue 7
  • DOI: 10.1038/nbt.4163

Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”
journal, November 2018


XGBoost: A Scalable Tree Boosting System
conference, January 2016

  • Chen, Tianqi; Guestrin, Carlos
  • Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16
  • DOI: 10.1145/2939672.2939785

Metabolic Burden: Cornerstones in Synthetic Biology and Metabolic Engineering Applications
journal, August 2016


Matplotlib: A 2D Graphics Environment
journal, January 2007


Systematic Evaluation of Methods for Integration of Transcriptomic Data into Constraint-Based Models of Metabolism
journal, April 2014


The Experiment Data Depot: A Web-Based Software Tool for Biological Experimental Data Storage, Sharing, and Visualization
journal, September 2017

  • Morrell, William C.; Birkel, Garrett W.; Forrer, Mark
  • ACS Synthetic Biology, Vol. 6, Issue 12
  • DOI: 10.1021/acssynbio.7b00204

OMERO: flexible, model-driven data management for experimental biology
journal, February 2012

  • Allan, Chris; Burel, Jean-Marie; Moore, Josh
  • Nature Methods, Vol. 9, Issue 3
  • DOI: 10.1038/nmeth.1896

Evaluating Factors That Influence Microbial Synthesis Yields by Linear Regression with Numerical and Ordinal Variables
journal, November 2010

  • Colletti, Peter F.; Goyal, Yogesh; Varman, Arul M.
  • Biotechnology and Bioengineering, Vol. 108, Issue 4
  • DOI: 10.1002/bit.22996

iML1515, a knowledgebase that computes Escherichia coli traits
journal, October 2017

  • Monk, Jonathan M.; Lloyd, Colton J.; Brunk, Elizabeth
  • Nature Biotechnology, Vol. 35, Issue 10
  • DOI: 10.1038/nbt.3956

Deep learning for computational biology
journal, July 2016

  • Angermueller, Christof; Pärnamaa, Tanel; Parts, Leopold
  • Molecular Systems Biology, Vol. 12, Issue 7
  • DOI: 10.15252/msb.20156651

Engineering Cellular Metabolism
journal, March 2016


Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming
journal, April 2016


COBRApy: COnstraints-Based Reconstruction and Analysis for Python
journal, January 2013

  • Ebrahim, Ali; Lerman, Joshua A.; Palsson, Bernhard O.
  • BMC Systems Biology, Vol. 7, Issue 1
  • DOI: 10.1186/1752-0509-7-74

Evaluating Factors That Influence Microbial Synthesis Yields by Linear Regression with Numerical and Ordinal Variables
journal, November 2010

  • Colletti, Peter F.; Goyal, Yogesh; Varman, Arul M.
  • Biotechnology and Bioengineering, Vol. 108, Issue 4
  • DOI: 10.1002/bit.22996

Japanese encephalitis genotype I virus-like particles stably expressed in BHK-21 cells serves as potential antigen in JE IgM ELISA
journal, February 2022


Engineering Cellular Metabolism
journal, March 2016


Computational methods in metabolic engineering for strain design
journal, August 2015


iML1515, a knowledgebase that computes Escherichia coli traits
journal, October 2017

  • Monk, Jonathan M.; Lloyd, Colton J.; Brunk, Elizabeth
  • Nature Biotechnology, Vol. 35, Issue 10
  • DOI: 10.1038/nbt.3956

KBase: The United States Department of Energy Systems Biology Knowledgebase
journal, July 2018

  • Arkin, Adam P.; Cottingham, Robert W.; Henry, Christopher S.
  • Nature Biotechnology, Vol. 36, Issue 7
  • DOI: 10.1038/nbt.4163

Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”
journal, November 2018


An ancient Chinese wisdom for metabolic engineering: Yin-Yang
journal, March 2015


Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming
journal, April 2016


Works referencing / citing this record:

Machine and deep learning meet genome-scale metabolic modeling
journal, July 2019


The era of big data: Genome-scale modelling meets machine learning
journal, January 2020

  • Antonakoudis, Athanasios; Barbosa, Rodrigo; Kotidis, Pavlos
  • Computational and Structural Biotechnology Journal, Vol. 18
  • DOI: 10.1016/j.csbj.2020.10.011