DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets

Abstract

Abstract Background Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research. Results We optimized and employed a pipeline integrating various “guilt-by-association” (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum . Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum . In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions. Conclusions This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.

Authors:
; ; ; ; ; ; ORCiD logo
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1782393
Alternate Identifier(s):
OSTI ID: 1814393
Grant/Contract Number:  
Center for Bioenergy Innovation (CBI); AC05-00OR22725
Resource Type:
Published Article
Journal Name:
Biotechnology for Biofuels
Additional Journal Information:
Journal Name: Biotechnology for Biofuels Journal Volume: 14 Journal Issue: 1; Journal ID: ISSN 1754-6834
Publisher:
Springer Science + Business Media
Country of Publication:
Netherlands
Language:
English
Subject:
09 BIOMASS FUELS

Citation Formats

Poudel, Suresh, Cope, Alexander L., O’Dell, Kaela B., Guss, Adam M., Seo, Hyeongmin, Trinh, Cong T., and Hettich, Robert L. Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets. Netherlands: N. p., 2021. Web. doi:10.1186/s13068-021-01964-4.
Poudel, Suresh, Cope, Alexander L., O’Dell, Kaela B., Guss, Adam M., Seo, Hyeongmin, Trinh, Cong T., & Hettich, Robert L. Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets. Netherlands. https://doi.org/10.1186/s13068-021-01964-4
Poudel, Suresh, Cope, Alexander L., O’Dell, Kaela B., Guss, Adam M., Seo, Hyeongmin, Trinh, Cong T., and Hettich, Robert L. Mon . "Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets". Netherlands. https://doi.org/10.1186/s13068-021-01964-4.
@article{osti_1782393,
title = {Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets},
author = {Poudel, Suresh and Cope, Alexander L. and O’Dell, Kaela B. and Guss, Adam M. and Seo, Hyeongmin and Trinh, Cong T. and Hettich, Robert L.},
abstractNote = {Abstract Background Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research. Results We optimized and employed a pipeline integrating various “guilt-by-association” (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum . Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum . In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions. Conclusions This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.},
doi = {10.1186/s13068-021-01964-4},
journal = {Biotechnology for Biofuels},
number = 1,
volume = 14,
place = {Netherlands},
year = {Mon May 10 00:00:00 EDT 2021},
month = {Mon May 10 00:00:00 EDT 2021}
}

Works referenced in this record:

Clostridium thermocellum LL1210 pH homeostasis mechanisms informed by transcriptomics and metabolomics
journal, April 2018

  • Whitham, Jason M.; Moon, Ji-Won; Rodriguez, Miguel
  • Biotechnology for Biofuels, Vol. 11, Issue 1
  • DOI: 10.1186/s13068-018-1095-y

Structural Determinants of the Rate of Protein Evolution in Yeast
journal, June 2006

  • Bloom, Jesse D.; Drummond, D. Allan; Arnold, Frances H.
  • Molecular Biology and Evolution, Vol. 23, Issue 9
  • DOI: 10.1093/molbev/msl040

Elimination of metabolic pathways to all traditional fermentation products increases ethanol yields in Clostridium thermocellum
journal, November 2015


Engineering modular ester fermentative pathways in Escherichia coli
journal, November 2014


Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles
journal, April 1999

  • Pellegrini, M.; Marcotte, E. M.; Thompson, M. J.
  • Proceedings of the National Academy of Sciences, Vol. 96, Issue 8
  • DOI: 10.1073/pnas.96.8.4285

Evolutionary rate covariation reveals shared functionality and coexpression of genes
journal, January 2012


The emergence of Clostridium thermocellum as a high utility candidate for consolidated bioprocessing applications
journal, August 2014


InterProScan - an integration platform for the signature-recognition methods in InterPro
journal, September 2001


clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters
journal, May 2012

  • Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan
  • OMICS: A Journal of Integrative Biology, Vol. 16, Issue 5
  • DOI: 10.1089/omi.2011.0118

Coevolution of gene expression among interacting proteins
journal, June 2004

  • Fraser, Hunter B.; Hirsh, Aaron E.; Wall, Dennis P.
  • Proceedings of the National Academy of Sciences, Vol. 101, Issue 24
  • DOI: 10.1073/pnas.0402591101

SignalP 4.0: discriminating signal peptides from transmembrane regions
journal, September 2011

  • Petersen, Thomas Nordahl; Brunak, Søren; von Heijne, Gunnar
  • Nature Methods, Vol. 8, Issue 10
  • DOI: 10.1038/nmeth.1701

“Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks
journal, March 2012


Microbial synthesis of a branched-chain ester platform from organic waste carboxylates
journal, December 2016


FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010


BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences
journal, February 2016

  • Kanehisa, Minoru; Sato, Yoko; Morishima, Kanae
  • Journal of Molecular Biology, Vol. 428, Issue 4
  • DOI: 10.1016/j.jmb.2015.11.006

Proteins of Unknown Function in the Protein Data Bank (PDB): An Inventory of True Uncharacterized Proteins and Computational Tools for Their Analysis
journal, October 2012

  • Nadzirin, Nurul; Firdaus-Raih, Mohd
  • International Journal of Molecular Sciences, Vol. 13, Issue 12
  • DOI: 10.3390/ijms131012761

Prediction of Gene Function by Genome-Scale Expression Analysis: Prostate Cancer-Associated Genes
journal, December 1999


moFF: a robust and automated approach to extract peptide ion intensities
journal, November 2016

  • Argentini, Andrea; Goeminne, Ludger J. E.; Verheggen, Kenneth
  • Nature Methods, Vol. 13, Issue 12
  • DOI: 10.1038/nmeth.4075

Why highly expressed proteins evolve slowly
journal, September 2005

  • Drummond, D. A.; Bloom, J. D.; Adami, C.
  • Proceedings of the National Academy of Sciences, Vol. 102, Issue 40
  • DOI: 10.1073/pnas.0504070102

Phylogenetic models of rate heterogeneity: a high performance computing perspective
conference, January 2006


Annotation of proteins of unknown function: initial enzyme results
journal, January 2015

  • McKay, Talia; Hart, Kaitlin; Horn, Alison
  • Journal of Structural and Functional Genomics, Vol. 16, Issue 1
  • DOI: 10.1007/s10969-015-9194-5

The Impact of Multifunctional Genes on "Guilt by Association" Analysis
journal, February 2011


Redirecting carbon flux through exogenous pyruvate kinase to achieve high ethanol yields in Clostridium thermocellum
journal, January 2013


DOOR 2.0: presenting operons and their functions through dynamic and integrated views
journal, November 2013

  • Mao, Xizeng; Ma, Qin; Zhou, Chuan
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1048

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
journal, November 2019


High Ethanol Titers from Cellulose by Using Metabolically Engineered Thermophilic, Anaerobic Microbes
journal, September 2011

  • Argyros, D. Aaron; Tripathi, Shital A.; Barrett, Trisha F.
  • Applied and Environmental Microbiology, Vol. 77, Issue 23, p. 8288-8294
  • DOI: 10.1128/AEM.00646-11

Quantitative assessment of relationship between sequence similarity and function similarity
journal, January 2007


OrthoFinder: phylogenetic orthology inference for comparative genomics
journal, November 2019


Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra
journal, September 2011

  • Diament, Benjamin J.; Noble, William Stafford
  • Journal of Proteome Research, Vol. 10, Issue 9
  • DOI: 10.1021/pr101196n

Semi-supervised learning for peptide identification from shotgun proteomics datasets
journal, October 2007

  • Käll, Lukas; Canterbury, Jesse D.; Weston, Jason
  • Nature Methods, Vol. 4, Issue 11
  • DOI: 10.1038/nmeth1113

Increase in Ethanol Yield via Elimination of Lactate Production in an Ethanol-Tolerant Mutant of Clostridium thermocellum
journal, February 2014


PANNZER2: a rapid functional annotation web server
journal, May 2018

  • Törönen, Petri; Medlar, Alan; Holm, Liisa
  • Nucleic Acids Research, Vol. 46, Issue W1
  • DOI: 10.1093/nar/gky350

Simultaneous achievement of high ethanol yield and titer in Clostridium thermocellum
journal, June 2016


A Single Determinant Dominates the Rate of Yeast Protein Evolution
journal, October 2005

  • Drummond, D. Allan; Raval, Alpan; Wilke, Claus O.
  • Molecular Biology and Evolution, Vol. 23, Issue 2
  • DOI: 10.1093/molbev/msj038

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
journal, July 2002


An expanded evaluation of protein function prediction methods shows an improvement in accuracy
journal, September 2016


Expanding ester biosynthesis in Escherichia coli
journal, March 2014

  • Rodriguez, Gabriel M.; Tashiro, Yohei; Atsumi, Shota
  • Nature Chemical Biology, Vol. 10, Issue 4
  • DOI: 10.1038/nchembio.1476

The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications
journal, January 1987

  • Sharp, Paul M.; Li, Wen-Hsiung
  • Nucleic Acids Research, Vol. 15, Issue 3, p. 1281-1295
  • DOI: 10.1093/nar/15.3.1281

limma powers differential expression analyses for RNA-sequencing and microarray studies
journal, January 2015

  • Ritchie, Matthew E.; Phipson, Belinda; Wu, Di
  • Nucleic Acids Research, Vol. 43, Issue 7
  • DOI: 10.1093/nar/gkv007

trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
journal, June 2009


The rapid generation of mutation data matrices from protein sequences
journal, January 1992


Acetyl-CoA and the regulation of metabolism: mechanisms and consequences
journal, April 2015


The names Hungateiclostridium Zhang et al. 2018, Hungateiclostridium thermocellum (Viljoen et al. 1926) Zhang et al. 2018, Hungateiclostridium cellulolyticum (Patel et al. 1980) Zhang et al. 2018, Hungateiclostridium aldrichii (Yang et al. 1990) Zhang et al. 2018, Hungateiclostridium alkalicellulosi (Zhilina et al. 2006) Zhang et al. 2018, Hungateiclostridium clariflavum (Shiratori et al. 2009) Zhang et al. 2018, Hungateiclostridium straminisolvens (Kato et al. 2004) Zhang et al. 2018 and Hungateiclostridium saccincola (Koeck et al. 2016) Zhang et al. 2018 contravene Rule 51b of the International Code of Nomenclature of Prokaryotes and require replacement names in the genus Acetivibrio Patel et al. 1980
journal, December 2019

  • Tindall, B. J.
  • International Journal of Systematic and Evolutionary Microbiology, Vol. 69, Issue 12
  • DOI: 10.1099/ijsem.0.003685

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data
journal, October 2018


Transcriptomic and proteomic changes from medium supplementation and strain evolution in high-yielding Clostridium thermocellum strains
journal, November 2018

  • Papanek, Beth; O’Dell, Kaela B.; Manga, Punita
  • Journal of Industrial Microbiology and Biotechnology, Vol. 45, Issue 11
  • DOI: 10.1007/s10295-018-2073-x

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper
journal, April 2017

  • Huerta-Cepas, Jaime; Forslund, Kristoffer; Coelho, Luis Pedro
  • Molecular Biology and Evolution, Vol. 34, Issue 8
  • DOI: 10.1093/molbev/msx148

Elimination of hydrogenase active site assembly blocks H2 production and increases ethanol yield in Clostridium thermocellum
journal, January 2015

  • Biswas, Ranjita; Zheng, Tianyong; Olson, Daniel G.
  • Biotechnology for Biofuels, Vol. 8, Issue 1
  • DOI: 10.1186/s13068-015-0204-4

The toxicity of recombinant proteins in Escherichia coli: a comparison of overexpression in BL21(DE3), C41(DE3), and C43(DE3)
journal, September 2004

  • Dumon-Seignovert, Laurence; Cariot, Guillaume; Vuillard, Laurent
  • Protein Expression and Purification, Vol. 37, Issue 1
  • DOI: 10.1016/j.pep.2004.04.025

BLAST+: architecture and applications
journal, January 2009

  • Camacho, Christiam; Coulouris, George; Avagyan, Vahram
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-421

Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen
journal, January 2001

  • Krogh, Anders; Larsson, Björn; von Heijne, Gunnar
  • Journal of Molecular Biology, Vol. 305, Issue 3
  • DOI: 10.1006/jmbi.2000.4315

Link-based quantitative methods to identify differentially coexpressed genes and gene Pairs
journal, January 2011


Basic local alignment search tool
journal, October 1990

  • Altschul, Stephen F.; Gish, Warren; Miller, Webb
  • Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
  • DOI: 10.1016/S0022-2836(05)80360-2

Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes
journal, June 2005


Biochemical functional predictions for protein structures of unknown or uncertain function
journal, January 2015

  • Mills, Caitlyn L.; Beuning, Penny J.; Ondrechen, Mary Jo
  • Computational and Structural Biotechnology Journal, Vol. 13
  • DOI: 10.1016/j.csbj.2015.02.003

Comparative expression profiling reveals widespread coordinated evolution of gene expression across eukaryotes
journal, November 2018


Microbial biosynthesis of lactate esters
journal, September 2019


Elimination of formate production in Clostridium thermocellum
journal, July 2015

  • Rydzak, Thomas; Lynd, Lee R.; Guss, Adam M.
  • Journal of Industrial Microbiology & Biotechnology, Vol. 42, Issue 9
  • DOI: 10.1007/s10295-015-1644-3

Relating Whole-Genome Expression Data with Protein-Protein Interactions
journal, January 2002


ggtree : an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data
journal, September 2016

  • Yu, Guangchuang; Smith, David K.; Zhu, Huachen
  • Methods in Ecology and Evolution, Vol. 8, Issue 1
  • DOI: 10.1111/2041-210X.12628

Aromatic Amino Acid-Derived Compounds Induce Morphological Changes and Modulate the Cell Growth of Wine Yeast Species
journal, April 2018

  • González, Beatriz; Vázquez, Jennifer; Cullen, Paul J.
  • Frontiers in Microbiology, Vol. 9
  • DOI: 10.3389/fmicb.2018.00670

Pfam: the protein families database
journal, November 2013

  • Finn, Robert D.; Bateman, Alex; Clements, Jody
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1223

Confronting the catalytic dark matter encoded by sequenced genomes
journal, October 2017

  • Ellens, Kenneth W.; Christian, Nils; Singh, Charandeep
  • Nucleic Acids Research, Vol. 45, Issue 20
  • DOI: 10.1093/nar/gkx937

Protein Annotation at Genomic Scale:  The Current Status
journal, August 2007


A large-scale evaluation of computational protein function prediction
journal, January 2013

  • Radivojac, Predrag; Clark, Wyatt T.; Oron, Tal Ronnen
  • Nature Methods, Vol. 10, Issue 3
  • DOI: 10.1038/nmeth.2340

Guilt-by-association goes global
journal, February 2000


Protein abundances are more conserved than mRNA abundances across diverse taxa
journal, November 2010


CDD/SPARCLE: functional classification of proteins via subfamily domain architectures
journal, November 2016

  • Marchler-Bauer, Aron; Bo, Yu; Han, Lianyi
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1129