DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Leveraging knowledge engineering and machine learning for microbial bio-manufacturing

Abstract

Genome scale modeling (GSM) predicts the performance of microbial workhorses and helps identify beneficial gene targets. GSM integrated with intracellular flux dynamics, omics, and thermodynamics have shown remarkable progress in both elucidating complex cellular phenomena and computational strain design (CSD). Nonetheless, these models still show high uncertainty due to a poor understanding of innate pathway regulations, metabolic burdens, and other factors (such as stress tolerance and metabolite channeling). Besides, the engineered hosts may have genetic mutations or non-genetic variations in bioreactor conditions and thus CSD rarely foresees fermentation rate and titer. Metabolic models play important role in design-build-test-learn cycles for strain improvement, and machine learning (ML) may provide a viable complementary approach for driving strain design and deciphering cellular processes. In order to develop quality ML models, knowledge engineering leverages and standardizes the wealth of information in literature (e.g., genomic/phenomic data, synthetic biology strategies, and bioprocess variables). Data driven frameworks can offer new constraints for mechanistic models to describe cellular regulations, to design pathways, to search gene targets, and to estimate fermentation titer/rate/yield under specified growth conditions (e.g., mixing, nutrients, and O2). This review highlights the scope of information collections, database constructions, and machine learning techniques (such as deep learningmore » and transfer learning), which may facilitate "Learn and Design" for strain development.« less

Authors:
 [1];  [2];  [1];  [3];  [1]
  1. Washington Univ. in Saint Louis, Saint Louis, MO (United States)
  2. Iowa State Univ., Ames, IA (United States)
  3. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); DOE, Agile BioFoundry, Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Washington Univ., St. Louis, MO (United States)
Sponsoring Org.:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Transportation Office. Bioenergy Technologies Office; USDOE Office of Science (SC), Biological and Environmental Research (BER); USDOE Office of Energy Efficiency and Renewable Energy (EERE)
OSTI Identifier:
1510752
Alternate Identifier(s):
OSTI ID: 1529157; OSTI ID: 1564528
Grant/Contract Number:  
AC02-05CH11231; SC0018324; MCB 1616619; DESC0018324
Resource Type:
Accepted Manuscript
Journal Name:
Biotechnology Advances
Additional Journal Information:
Journal Volume: 36; Journal Issue: 4; Journal ID: ISSN 0734-9750
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Deep learning; Design-build-test-learn; Genome scale modeling; Metabolic burdens

Citation Formats

Oyetunde, Tolutola, Bao, Forrest Sheng, Chen, Jiung -Wen, Martin, Hector Garcia, and Tang, Yinjie J. Leveraging knowledge engineering and machine learning for microbial bio-manufacturing. United States: N. p., 2018. Web. doi:10.1016/j.biotechadv.2018.04.008.
Oyetunde, Tolutola, Bao, Forrest Sheng, Chen, Jiung -Wen, Martin, Hector Garcia, & Tang, Yinjie J. Leveraging knowledge engineering and machine learning for microbial bio-manufacturing. United States. https://doi.org/10.1016/j.biotechadv.2018.04.008
Oyetunde, Tolutola, Bao, Forrest Sheng, Chen, Jiung -Wen, Martin, Hector Garcia, and Tang, Yinjie J. Thu . "Leveraging knowledge engineering and machine learning for microbial bio-manufacturing". United States. https://doi.org/10.1016/j.biotechadv.2018.04.008. https://www.osti.gov/servlets/purl/1510752.
@article{osti_1510752,
title = {Leveraging knowledge engineering and machine learning for microbial bio-manufacturing},
author = {Oyetunde, Tolutola and Bao, Forrest Sheng and Chen, Jiung -Wen and Martin, Hector Garcia and Tang, Yinjie J.},
abstractNote = {Genome scale modeling (GSM) predicts the performance of microbial workhorses and helps identify beneficial gene targets. GSM integrated with intracellular flux dynamics, omics, and thermodynamics have shown remarkable progress in both elucidating complex cellular phenomena and computational strain design (CSD). Nonetheless, these models still show high uncertainty due to a poor understanding of innate pathway regulations, metabolic burdens, and other factors (such as stress tolerance and metabolite channeling). Besides, the engineered hosts may have genetic mutations or non-genetic variations in bioreactor conditions and thus CSD rarely foresees fermentation rate and titer. Metabolic models play important role in design-build-test-learn cycles for strain improvement, and machine learning (ML) may provide a viable complementary approach for driving strain design and deciphering cellular processes. In order to develop quality ML models, knowledge engineering leverages and standardizes the wealth of information in literature (e.g., genomic/phenomic data, synthetic biology strategies, and bioprocess variables). Data driven frameworks can offer new constraints for mechanistic models to describe cellular regulations, to design pathways, to search gene targets, and to estimate fermentation titer/rate/yield under specified growth conditions (e.g., mixing, nutrients, and O2). This review highlights the scope of information collections, database constructions, and machine learning techniques (such as deep learning and transfer learning), which may facilitate "Learn and Design" for strain development.},
doi = {10.1016/j.biotechadv.2018.04.008},
journal = {Biotechnology Advances},
number = 4,
volume = 36,
place = {United States},
year = {Thu May 03 00:00:00 EDT 2018},
month = {Thu May 03 00:00:00 EDT 2018}
}

Journal Article:

Citation Metrics:
Cited by: 40 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Rhea—a manually curated resource of biochemical reactions
journal, November 2011

  • Alcántara, Rafael; Axelsen, Kristian B.; Morgat, Anne
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1126

OMERO: flexible, model-driven data management for experimental biology
journal, February 2012

  • Allan, Chris; Burel, Jean-Marie; Moore, Josh
  • Nature Methods, Vol. 9, Issue 3
  • DOI: 10.1038/nmeth.1896

Principal component analysis of proteomics (PCAP) as a tool to direct metabolic engineering
journal, March 2015


Cyclodextrin glycosyltransferase biosynthesis improvement by recombinant Lactococcus lactis NZ:NSP:CGT: medium formulation and culture condition optimization
journal, February 2015

  • Amiri, Azin; Mohamad, Rosfarizan; Rahim, Raha Abdul
  • Biotechnology & Biotechnological Equipment, Vol. 29, Issue 3
  • DOI: 10.1080/13102818.2015.1009713

iSCHRUNK – In Silico Approach to Characterization and Reduction of Uncertainty in the Kinetic Models of Genome-scale Metabolic Networks
journal, January 2016


Deep learning for computational biology
journal, July 2016

  • Angermueller, Christof; Pärnamaa, Tanel; Parts, Leopold
  • Molecular Systems Biology, Vol. 12, Issue 7
  • DOI: 10.15252/msb.20156651

Energy Balance for Analysis of Complex Metabolic Networks
journal, July 2002


Context-Specific Metabolic Networks Are Consistent with Experiments
journal, May 2008


Representation Learning: A Review and New Perspectives
journal, August 2013

  • Bengio, Y.; Courville, A.; Vincent, P.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, Issue 8
  • DOI: 10.1109/TPAMI.2013.50

Predicting Metabolic Fluxes Using Gene Expression Differences As Constraints
journal, January 2011

  • van Berlo, Rogier J. P.; de Ridder, Dick; Daran, Jean-Marc
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 8, Issue 1
  • DOI: 10.1109/TCBB.2009.55

Optknock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization
journal, October 2003

  • Burgard, Anthony P.; Pharkya, Priti; Maranas, Costas D.
  • Biotechnology and Bioengineering, Vol. 84, Issue 6
  • DOI: 10.1002/bit.10803

Flux Coupling Analysis of Genome-Scale Metabolic Network Reconstructions
journal, February 2004


Coping with complexity: Machine learning optimization of cell-free protein synthesis
journal, May 2011

  • Caschera, Filippo; Bedau, Mark A.; Buchanan, Andrew
  • Biotechnology and Bioengineering, Vol. 108, Issue 9
  • DOI: 10.1002/bit.23178

The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases
journal, November 2015

  • Caspi, Ron; Billington, Richard; Ferrer, Luciana
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1164

Quantifying cellular capacity identifies gene expression designs with reduced burden
journal, April 2015

  • Ceroni, Francesca; Algar, Rhys; Stan, Guy-Bart
  • Nature Methods, Vol. 12, Issue 5
  • DOI: 10.1038/nmeth.3339

Omics Meets Metabolic Pathway Engineering
journal, June 2016


k-OptForce: Integrating Kinetics with Flux Balance Analysis for Strain Design
journal, February 2014


Synthetic and systems biology for microbial production of commodity chemicals
journal, April 2016

  • Chubukov, Victor; Mukhopadhyay, Aindrila; Petzold, Christopher J.
  • npj Systems Biology and Applications, Vol. 2, Issue 1
  • DOI: 10.1038/npjsba.2016.9

Interpreting Expression Data with Metabolic Flux Models: Predicting Mycobacterium tuberculosis Mycolic Acid Production
journal, August 2009


Evaluating Factors That Influence Microbial Synthesis Yields by Linear Regression with Numerical and Ordinal Variables
journal, November 2010

  • Colletti, Peter F.; Goyal, Yogesh; Varman, Arul M.
  • Biotechnology and Bioengineering, Vol. 108, Issue 4
  • DOI: 10.1002/bit.22996

Machine learning methods for metabolic pathway prediction
journal, January 2010

  • Dale, Joseph M.; Popescu, Liviu; Karp, Peter D.
  • BMC Bioinformatics, Vol. 11, Issue 1
  • DOI: 10.1186/1471-2105-11-15

Metabolic gene–deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes
journal, September 2004

  • Fong, Stephen S.; Palsson, Bernhard Ø
  • Nature Genetics, Vol. 36, Issue 10
  • DOI: 10.1038/ng1432

Increased Malonyl Coenzyme A Biosynthesis by Tuning the Escherichia coli Metabolic Network and Its Application to Flavanone Production
journal, July 2009

  • Fowler, Z. L.; Gikandi, W. W.; Koffas, M. A. G.
  • Applied and Environmental Microbiology, Vol. 75, Issue 18
  • DOI: 10.1128/AEM.00270-09

High-throughput discovery metabolomics
journal, February 2015


Pseudo-transition Analysis Identifies the Key Regulators of Dynamic Metabolic Adaptations from Steady-State Data
journal, October 2015

  • Gerosa, Luca; Haverkorn van Rijsewijk, Bart R. B.; Christodoulou, Dimitris
  • Cell Systems, Vol. 1, Issue 4
  • DOI: 10.1016/j.cels.2015.09.008

Synthesis aided design: The biological design-build-test engineering paradigm?: Synthesis Aided Design
journal, November 2015

  • Gill, Ryan T.; Halweg-Edwards, Andrea L.; Clauset, Aaron
  • Biotechnology and Bioengineering, Vol. 113, Issue 1
  • DOI: 10.1002/bit.25857

Systems-level analysis of mechanisms regulating yeast metabolic flux
journal, October 2016


On-chip integration of droplet microfluidics and nanostructure-initiator mass spectrometry for enzyme screening
journal, January 2017

  • Heinemann, Joshua; Deng, Kai; Shih, Steve C. C.
  • Lab on a Chip, Vol. 17, Issue 2
  • DOI: 10.1039/C6LC01182A

Analysis of raw biofluids by mass spectrometry using microfluidic diffusion-based separation
journal, January 2017

  • Heinemann, Joshua; Noon, Brigit; Willems, Daniel
  • Analytical Methods, Vol. 9, Issue 3
  • DOI: 10.1039/C6AY02827F

Thermodynamics-Based Metabolic Flux Analysis
journal, March 2007

  • Henry, Christopher S.; Broadbelt, Linda J.; Hatzimanikatis, Vassily
  • Biophysical Journal, Vol. 92, Issue 5
  • DOI: 10.1529/biophysj.106.093138

Microbial life under extreme energy limitation
journal, January 2013

  • Hoehler, Tori M.; Jørgensen, Bo Barker
  • Nature Reviews Microbiology, Vol. 11, Issue 2
  • DOI: 10.1038/nrmicro2939

Multiple High-Throughput Analyses Monitor the Response of E. coli to Perturbations
journal, April 2007


Database and tools for metabolic network analysis
journal, July 2014

  • Jing, Lu Shi; Shah, Farah Fathiah Muzaffar; Mohamad, Mohd Saberi
  • Biotechnology and Bioprocess Engineering, Vol. 19, Issue 4
  • DOI: 10.1007/s12257-014-0172-8

Machine learning: Trends, perspectives, and prospects
journal, July 2015


KEGG as a reference resource for gene and protein annotation
journal, October 2015

  • Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1070

A genome-scale Escherichia coli kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains
journal, December 2016

  • Khodayari, Ali; Maranas, Costas D.
  • Nature Communications, Vol. 7, Issue 1
  • DOI: 10.1038/ncomms13806

BiGG Models: A platform for integrating, standardizing and sharing genome-scale models
journal, October 2015

  • King, Zachary A.; Lu, Justin; Dräger, Andreas
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1049

Literature mining supports a next-generation modeling approach to predict cellular byproduct secretion
journal, January 2017


Machine Learning of Protein Interactions in Fungal Secretory Pathways
journal, July 2016


Somewhat in control—the role of transcription in regulating microbial metabolic fluxes
journal, December 2013


SUMOFLUX: A Generalized Method for Targeted 13C Metabolic Flux Ratio Analysis
journal, September 2016


Putative regulatory sites unraveled by network‐embedded thermodynamic analysis of metabolome data
journal, January 2006

  • Kümmel, Anne; Panke, Sven; Heinemann, Matthias
  • Molecular Systems Biology, Vol. 2, Issue 1
  • DOI: 10.1038/msb4100074

Deep learning
journal, May 2015

  • LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
  • Nature, Vol. 521, Issue 7553
  • DOI: 10.1038/nature14539

Systems strategies for developing industrial microbial strains
journal, October 2015

  • Lee, Sang Yup; Kim, Hyun Uk
  • Nature Biotechnology, Vol. 33, Issue 10
  • DOI: 10.1038/nbt.3365

Deep learning of the tissue-regulated splicing code
journal, June 2014


Machine learning applications in genetics and genomics
journal, May 2015

  • Libbrecht, Maxwell W.; Noble, William Stafford
  • Nature Reviews Genetics, Vol. 16, Issue 6
  • DOI: 10.1038/nrg3920

Metabolic engineering of the pentose phosphate pathway for enhanced limonene production in the cyanobacterium Synechocysti s sp. PCC 6803
journal, December 2017


Enhancing fatty acid production in Escherichia coli by Vitreoscilla hemoglobin overexpression: Enhancing Fatty Acid Production by VHb Expression
journal, August 2016

  • Liu, Di; Wan, Ni; Zhang, Fuzhong
  • Biotechnology and Bioengineering, Vol. 114, Issue 2
  • DOI: 10.1002/bit.26067

Computational methods in metabolic engineering for strain design
journal, August 2015


Systematic Evaluation of Methods for Integration of Transcriptomic Data into Constraint-Based Models of Metabolism
journal, April 2014


Semisupervised Gaussian Process for Automated Enzyme Search
journal, March 2016


Multi-omics Quantification of Species Variation of Escherichia coli Links Molecular Features with Strain Phenotypes
journal, September 2016


The Experiment Data Depot: A Web-Based Software Tool for Biological Experimental Data Storage, Sharing, and Visualization
journal, September 2017

  • Morrell, William C.; Birkel, Garrett W.; Forrer, Mark
  • ACS Synthetic Biology, Vol. 6, Issue 12
  • DOI: 10.1021/acssynbio.7b00204

An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features
journal, January 2017

  • Nandi, Sutanu; Subramanian, Abhishek; Sarkar, Ram Rup
  • Molecular BioSystems, Vol. 13, Issue 8
  • DOI: 10.1039/C7MB00234C

Metabolic Engineering of Carotenoid Biosynthesis in Escherichia coli by Ordered Gene Assembly in Bacillus subtilis
journal, December 2006

  • Nishizaki, Tomoko; Tsuge, Kenji; Itaya, Mitsuhiro
  • Applied and Environmental Microbiology, Vol. 73, Issue 4
  • DOI: 10.1128/AEM.02268-06

Genome‐scale models of metabolism and gene expression extend and refine growth phenotype prediction
journal, January 2013

  • O'Brien, Edward J.; Lerman, Joshua A.; Chang, Roger L.
  • Molecular Systems Biology, Vol. 9, Issue 1
  • DOI: 10.1038/msb.2013.52

What is flux balance analysis?
journal, March 2010

  • Orth, Jeffrey D.; Thiele, Ines; Palsson, Bernhard Ø
  • Nature Biotechnology, Vol. 28, Issue 3
  • DOI: 10.1038/nbt.1614

A Survey on Transfer Learning
journal, October 2010

  • Pan, Sinno Jialin; Yang, Qiang
  • IEEE Transactions on Knowledge and Data Engineering, Vol. 22, Issue 10
  • DOI: 10.1109/TKDE.2009.191

Improvement of microbial strains and fermentation processes
journal, September 2000

  • Parekh, S.; Vinci, V. A.; Strobel, R. J.
  • Applied Microbiology and Biotechnology, Vol. 54, Issue 3
  • DOI: 10.1007/s002530000403

Does metabolite channeling accelerate enzyme-catalyzed cascade reactions?
journal, February 2017


OptForce: An Optimization Procedure for Identifying All Genetic Manipulations Leading to Targeted Overproductions
journal, April 2010

  • Ranganathan, Sridhar; Suthers, Patrick F.; Maranas, Costas D.
  • PLoS Computational Biology, Vol. 6, Issue 4
  • DOI: 10.1371/journal.pcbi.1000744

Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli
journal, January 2007

  • Schuetz, Robert; Kuepfer, Lars; Sauer, Uwe
  • Molecular Systems Biology, Vol. 3, Issue 1
  • DOI: 10.1038/msb4100162

Studies on process optimization methods for rapamycin production using Streptomyces hygroscopicus ATCC 29253
journal, September 2013

  • Sinha, Rupika; Singh, Shalini; Srivastava, Pradeep
  • Bioprocess and Biosystems Engineering, Vol. 37, Issue 5
  • DOI: 10.1007/s00449-013-1051-y

Machine learning in cell biology – teaching computers to recognize phenotypes
journal, November 2013

  • Sommer, Christoph; Gerlich, Daniel W.
  • Journal of Cell Science, Vol. 126, Issue 24
  • DOI: 10.1242/jcs.123604

Knowledge engineering: Principles and methods
journal, March 1998


Machine Learning and Its Applications to Biology
journal, January 2007


New types of experimental data shape the use of enzyme kinetics for dynamic network modeling
journal, November 2013

  • Tummler, Katja; Lubitz, Timo; Schelker, Max
  • FEBS Journal, Vol. 281, Issue 2
  • DOI: 10.1111/febs.12525

Global Rebalancing of Cellular Resources by Pleiotropic Point Mutations Illustrates a Multi-scale Mechanism of Adaptive Evolution
journal, April 2016


Statistics-based model for prediction of chemical biosynthesis yield from Saccharomyces cerevisiae
journal, January 2011

  • Varman, Arul M.; Xiao, Yi; Leonard, Effendi
  • Microbial Cell Factories, Vol. 10, Issue 1
  • DOI: 10.1186/1475-2859-10-45

The LASER database: Formalizing design rules for metabolic engineering
journal, December 2015

  • Winkler, James D.; Halweg-Edwards, Andrea L.; Gill, Ryan T.
  • Metabolic Engineering Communications, Vol. 2
  • DOI: 10.1016/j.meteno.2015.06.003

Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming
journal, April 2016


Theoretical Studies of Intracellular Concentration of Micro-organisms’ Metabolites
journal, August 2017


iMAT: an integrative metabolic analysis tool
journal, November 2010


Works referencing / citing this record:

Coupled metabolic‐hydrodynamic modeling enabling rational scale‐up of industrial bioprocesses
journal, December 2019

  • Wang, Guan; Haringa, Cees; Tang, Wenjun
  • Biotechnology and Bioengineering, Vol. 117, Issue 3
  • DOI: 10.1002/bit.27243

Harnessing microbial metabolomics for industrial applications
journal, December 2019

  • Zhao, Jiachen; Wang, Guan; Chu, Ju
  • World Journal of Microbiology and Biotechnology, Vol. 36, Issue 1
  • DOI: 10.1007/s11274-019-2775-x

Common principles and best practices for engineering microbiomes
journal, September 2019

  • Lawson, Christopher E.; Harcombe, William R.; Hatzenpichler, Roland
  • Nature Reviews Microbiology, Vol. 17, Issue 12
  • DOI: 10.1038/s41579-019-0255-9

Can enzyme proximity accelerate cascade reactions?
journal, January 2019


A Review of Dynamic Modeling Approaches and Their Application in Computational Strain Optimization for Metabolic Engineering
journal, July 2018


Towards a widespread adoption of metabolic modeling tools in biopharmaceutical industry: a process systems biology engineering perspective
journal, March 2020

  • Richelle, Anne; David, Blandine; Demaegd, Didier
  • npj Systems Biology and Applications, Vol. 6, Issue 1
  • DOI: 10.1038/s41540-020-0127-y

A Review of Dynamic Modeling Approaches and Their Application in Computational Strain Optimization for Metabolic Engineering
journal, July 2018


Machine learning-guided directed evolution for protein engineering
preprint, January 2018