Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

EnZymClass: Substrate specificity prediction tool of plant acyl-ACP thioesterases based on ensemble learning

Journal Article · · Current Research in Biotechnology
 [1];  [2];  [2];  [2];  [3]
  1. Pennsylvania State Univ., University Park, PA (United States); University of Illinois
  2. Univ. of Wisconsin, Madison, WI (United States)
  3. Pennsylvania State Univ., University Park, PA (United States)
Characterizing the functional properties of plant acyl-ACP thioesterases (TEs), a key enzyme class used in the production of renewable oleochemicals in microbial hosts, experimentally, can be an expensive and time consuming process since it requires manual screening of thousands of candidates in a database. Using amino acid sequence to computationally predict an enzyme’s function might accelerate this process; however obtaining the necessary amount of information on previously characterized enzymes and their respective sequences required by standard Machine Learning (ML) based approaches to accurately infer sequence-function relationships can be prohibitive, especially with a low-throughput testing cycle. Experimental noise, unbalanced dataset where high sequence similarity does not always imply identical functional properties will further prevent robust prediction performance. Herein we present a ML method, Ensemble method for enZyme Classification (EnZymClass), that is specifically designed to address these issues. We used EnZymClass to classify TEs into short, long and mixed free fatty acid substrate specificity categories. While general guidelines for inferring substrate specificity have been proposed before, prediction of chain-length preference from primary sequence has remained elusive for plant acyl-ACP TEs. By applying EnZymClass to a subset of TEs in the ThYme database, we identified two medium chain TEs, ClFatB3 and CwFatB2, with previously uncharacterized activity in E. coli fatty acid production hosts.
Research Organization:
CABBI, Urbana, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
SC0018420
OSTI ID:
1855996
Journal Information:
Current Research in Biotechnology, Journal Name: Current Research in Biotechnology Vol. 4; ISSN 2590-2628
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (36)

A process for microbial hydrocarbon synthesis: Overproduction of fatty acids in Escherichia coli and catalytic conversion to alkanes journal June 2010
Palm oil and palm kernel oil as raw materials for basic oleochemicals and biodiesel journal April 2007
Database of homology-derived protein structures and the structural meaning of sequence alignment journal January 1991
An empirical study on the matrix-based protein representations and their combination with sequence-based approaches journal October 2012
Synthesis of medium-chain length (C6–C10) fuels and chemicals via β-oxidation reversal in Escherichia coli journal February 2015
Engineering Escherichia coli to synthesize free fatty acids journal December 2012
Production of 1-octanol in Escherichia coli by a high flux thioesterase route journal September 2020
Computational Redesign of Acyl-ACP Thioesterase with Improved Selectivity toward Medium-Chain-Length Fatty Acids journal May 2017
Matching Protein Interfaces for Improved Medium-Chain Fatty Acid Production journal April 2018
Highly Active C 8 -Acyl-ACP Thioesterase Variant Isolated by a Synthetic Selection Strategy journal July 2018
Microbial synthesis of medium-chain chemicals from renewables journal December 2017
Two distinct domains contribute to the substrate acyl chain length selectivity of plant acyl-ACP thioesterase journal February 2018
Machine-learning-guided directed evolution for protein engineering journal July 2019
Unified rational protein engineering with sequence-based deep representation learning journal October 2019
Production of high levels of 8:0 and 10:0 fatty acids in transgenic canola by overexpression of Ch FatB2, a thioesterase cDNA from Cuphea hookeriana journal February 1996
Prediction and experimental validation of enzyme substrate specificity in protein structures journal October 2013
Engineering Yarrowia lipolytica as a platform for synthesis of drop-in transportation fuels and oleochemicals journal September 2016
Modification of the substrate specificity of an acyl-acyl carrier protein thioesterase by protein engineering. journal November 1995
Multi-class protein fold recognition using support vector machines and neural networks journal April 2001
Mismatch string kernels for discriminative protein classification journal January 2004
Protein homology detection using string alignment kernels journal February 2004
Clustal W and Clustal X version 2.0 journal September 2007
KeBABS: an R package for kernel-based analysis of biological sequences: Fig. 1. journal March 2015
POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles journal May 2017
iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences journal March 2018
Toward production of jet fuel functionality in oilseeds: identification of FatB acyl-acyl carrier protein thioesterases and evaluation of combinatorial expression strategies in Camelina seeds journal May 2015
PlantFAdb: a resource for exploring hundreds of plant fatty acid structures synthesized by thousands of plants and their phylogenetic relationships journal November 2018
Decision tree Based Information Integration for Automated Protein Classification journal June 2005
A brief survey on sequence classification journal November 2010
An Empirical Study of Different Approaches for Protein Classification journal January 2014
Phylogenetic and experimental characterization of an acyl-ACP thioesterase family reveals significant diversity in enzymatic specificity and activity journal January 2011
Predicting protein-protein interactions in unbalanced data using the primary structure of proteins journal January 2010
Genome scale prediction of substrate specificity for acyl adenylate superfamily of enzymes based on active site residue profiles journal January 2010
Increasing medium chain fatty acids production in Yarrowia lipolytica by metabolic engineering journal September 2018
Support Vector Machines and Kernels for Computational Biology journal October 2008
GlycoPP: A Webserver for Prediction of N- and O-Glycosites in Prokaryotic Protein Sequences journal July 2012

Similar Records

deeprob/ThioesteraseEnzymeSpecificity: EnZymClass-first-release
Dataset · Thu Jun 24 20:00:00 EDT 2021 · OSTI ID:3014139

Computational Redesign of Acyl-ACP Thioesterase with Improved Selectivity toward Medium-Chain-Length Fatty Acids
Journal Article · Wed Apr 19 20:00:00 EDT 2017 · ACS Catalysis · OSTI ID:1408279

Chimeric Fatty Acyl-Acyl Carrier Protein Thioesterases Provide Mechanistic Insight into Enzyme Specificity and Expression
Journal Article · Thu Mar 15 20:00:00 EDT 2018 · Applied and Environmental Microbiology · OSTI ID:1503630