Machine learning framework for assessment of microbial factory performance
Abstract
Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonablymore »
- Authors:
-
- Washington Univ., St. Louis, MO (United States). Dept. of Energy, Environmental and Chemical Engineering
- Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); DOE Agile BioFoundry, Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Systems and Engineering Division; Basque Center for Applied Mathematics, Bilbao (Spain)
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Washington Univ., St. Louis, MO (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER); National Science Foundation (NSF)
- OSTI Identifier:
- 1510764
- Grant/Contract Number:
- AC02-05CH11231; SC0018324; MCB 1616619
- Resource Type:
- Accepted Manuscript
- Journal Name:
- PLoS ONE
- Additional Journal Information:
- Journal Volume: 14; Journal Issue: 1; Journal ID: ISSN 1932-6203
- Publisher:
- Public Library of Science
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; machine learning; metabolic engineering; genetic engineering; principal component analysis; genomic databases; cell metabolism; fermentation; simulation and modeling
Citation Formats
Oyetunde, Tolutola, Liu, Di, Martin, Hector Garcia, and Tang, Yinjie J. Machine learning framework for assessment of microbial factory performance. United States: N. p., 2019.
Web. doi:10.1371/journal.pone.0210558.
Oyetunde, Tolutola, Liu, Di, Martin, Hector Garcia, & Tang, Yinjie J. Machine learning framework for assessment of microbial factory performance. United States. https://doi.org/10.1371/journal.pone.0210558
Oyetunde, Tolutola, Liu, Di, Martin, Hector Garcia, and Tang, Yinjie J. Tue .
"Machine learning framework for assessment of microbial factory performance". United States. https://doi.org/10.1371/journal.pone.0210558. https://www.osti.gov/servlets/purl/1510764.
@article{osti_1510764,
title = {Machine learning framework for assessment of microbial factory performance},
author = {Oyetunde, Tolutola and Liu, Di and Martin, Hector Garcia and Tang, Yinjie J.},
abstractNote = {Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonably high cross-validation accuracy for prediction of E.coli factory performance metrics under presumed bioprocess and pathway conditions (Pearson correlation coefficients between 0.8 and 0.93 on new data not seen by the model).},
doi = {10.1371/journal.pone.0210558},
journal = {PLoS ONE},
number = 1,
volume = 14,
place = {United States},
year = {Tue Jan 15 00:00:00 EST 2019},
month = {Tue Jan 15 00:00:00 EST 2019}
}
Web of Science
Works referenced in this record:
Computational methods in metabolic engineering for strain design
journal, August 2015
- Long, Matthew R.; Ong, Wai Kit; Reed, Jennifer L.
- Current Opinion in Biotechnology, Vol. 34
An ancient Chinese wisdom for metabolic engineering: Yin-Yang
journal, March 2015
- Wu, Stephen G.; He, Lian; Wang, Qingzhao
- Microbial Cell Factories, Vol. 14, Issue 1
Redesigning Escherichia coli Metabolism for Anaerobic Production of Isobutanol
journal, June 2011
- Trinh, Cong T.; Li, Johnny; Blanch, Harvey W.
- Applied and Environmental Microbiology, Vol. 77, Issue 14
The LASER database: Formalizing design rules for metabolic engineering
journal, December 2015
- Winkler, James D.; Halweg-Edwards, Andrea L.; Gill, Ryan T.
- Metabolic Engineering Communications, Vol. 2
KBase: The United States Department of Energy Systems Biology Knowledgebase
journal, July 2018
- Arkin, Adam P.; Cottingham, Robert W.; Henry, Christopher S.
- Nature Biotechnology, Vol. 36, Issue 7
Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”
journal, November 2018
- Chuang, Kangway V.; Keiser, Michael J.
- Science, Vol. 362, Issue 6416
XGBoost: A Scalable Tree Boosting System
conference, January 2016
- Chen, Tianqi; Guestrin, Carlos
- Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16
Metabolic Burden: Cornerstones in Synthetic Biology and Metabolic Engineering Applications
journal, August 2016
- Wu, Gang; Yan, Qiang; Jones, J. Andrew
- Trends in Biotechnology, Vol. 34, Issue 8
Matplotlib: A 2D Graphics Environment
journal, January 2007
- Hunter, John D.
- Computing in Science & Engineering, Vol. 9, Issue 3
Systematic Evaluation of Methods for Integration of Transcriptomic Data into Constraint-Based Models of Metabolism
journal, April 2014
- Machado, Daniel; Herrgård, Markus
- PLoS Computational Biology, Vol. 10, Issue 4
The Experiment Data Depot: A Web-Based Software Tool for Biological Experimental Data Storage, Sharing, and Visualization
journal, September 2017
- Morrell, William C.; Birkel, Garrett W.; Forrer, Mark
- ACS Synthetic Biology, Vol. 6, Issue 12
OMERO: flexible, model-driven data management for experimental biology
journal, February 2012
- Allan, Chris; Burel, Jean-Marie; Moore, Josh
- Nature Methods, Vol. 9, Issue 3
Evaluating Factors That Influence Microbial Synthesis Yields by Linear Regression with Numerical and Ordinal Variables
journal, November 2010
- Colletti, Peter F.; Goyal, Yogesh; Varman, Arul M.
- Biotechnology and Bioengineering, Vol. 108, Issue 4
iML1515, a knowledgebase that computes Escherichia coli traits
journal, October 2017
- Monk, Jonathan M.; Lloyd, Colton J.; Brunk, Elizabeth
- Nature Biotechnology, Vol. 35, Issue 10
Deep learning for computational biology
journal, July 2016
- Angermueller, Christof; Pärnamaa, Tanel; Parts, Leopold
- Molecular Systems Biology, Vol. 12, Issue 7
Engineering Cellular Metabolism
journal, March 2016
- Nielsen, Jens; Keasling, Jay D.
- Cell, Vol. 164, Issue 6
CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics
journal, November 2014
- Zhang, Zhengdong; Shen, Tie; Rui, Bin
- Nucleic Acids Research, Vol. 43, Issue D1
Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming
journal, April 2016
- Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu
- PLOS Computational Biology, Vol. 12, Issue 4
COBRApy: COnstraints-Based Reconstruction and Analysis for Python
journal, January 2013
- Ebrahim, Ali; Lerman, Joshua A.; Palsson, Bernhard O.
- BMC Systems Biology, Vol. 7, Issue 1
Evaluating Factors That Influence Microbial Synthesis Yields by Linear Regression with Numerical and Ordinal Variables
journal, November 2010
- Colletti, Peter F.; Goyal, Yogesh; Varman, Arul M.
- Biotechnology and Bioengineering, Vol. 108, Issue 4
Japanese encephalitis genotype I virus-like particles stably expressed in BHK-21 cells serves as potential antigen in JE IgM ELISA
journal, February 2022
- Mali, Dn; Bondre, Vp
- Applied Microbiology and Biotechnology, Vol. 106, Issue 5-6
Engineering Cellular Metabolism
journal, March 2016
- Nielsen, Jens; Keasling, Jay D.
- Cell, Vol. 164, Issue 6
Computational methods in metabolic engineering for strain design
journal, August 2015
- Long, Matthew R.; Ong, Wai Kit; Reed, Jennifer L.
- Current Opinion in Biotechnology, Vol. 34
iML1515, a knowledgebase that computes Escherichia coli traits
journal, October 2017
- Monk, Jonathan M.; Lloyd, Colton J.; Brunk, Elizabeth
- Nature Biotechnology, Vol. 35, Issue 10
KBase: The United States Department of Energy Systems Biology Knowledgebase
journal, July 2018
- Arkin, Adam P.; Cottingham, Robert W.; Henry, Christopher S.
- Nature Biotechnology, Vol. 36, Issue 7
Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”
journal, November 2018
- Chuang, Kangway V.; Keiser, Michael J.
- Science, Vol. 362, Issue 6416
An ancient Chinese wisdom for metabolic engineering: Yin-Yang
journal, March 2015
- Wu, Stephen G.; He, Lian; Wang, Qingzhao
- Microbial Cell Factories, Vol. 14, Issue 1
Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming
journal, April 2016
- Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu
- PLOS Computational Biology, Vol. 12, Issue 4
Works referencing / citing this record:
Machine and deep learning meet genome-scale metabolic modeling
journal, July 2019
- Zampieri, Guido; Vijayakumar, Supreeta; Yaneske, Elisabeth
- PLOS Computational Biology, Vol. 15, Issue 7
The era of big data: Genome-scale modelling meets machine learning
journal, January 2020
- Antonakoudis, Athanasios; Barbosa, Rodrigo; Kotidis, Pavlos
- Computational and Structural Biotechnology Journal, Vol. 18