Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets
Abstract
Abstract Background Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research. Results We optimized and employed a pipeline integrating various “guilt-by-association” (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum . Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum . In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions. Conclusions This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.
- Authors:
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1782393
- Alternate Identifier(s):
- OSTI ID: 1814393
- Grant/Contract Number:
- Center for Bioenergy Innovation (CBI); AC05-00OR22725
- Resource Type:
- Published Article
- Journal Name:
- Biotechnology for Biofuels
- Additional Journal Information:
- Journal Name: Biotechnology for Biofuels Journal Volume: 14 Journal Issue: 1; Journal ID: ISSN 1754-6834
- Publisher:
- Springer Science + Business Media
- Country of Publication:
- Netherlands
- Language:
- English
- Subject:
- 09 BIOMASS FUELS
Citation Formats
Poudel, Suresh, Cope, Alexander L., O’Dell, Kaela B., Guss, Adam M., Seo, Hyeongmin, Trinh, Cong T., and Hettich, Robert L. Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets. Netherlands: N. p., 2021.
Web. doi:10.1186/s13068-021-01964-4.
Poudel, Suresh, Cope, Alexander L., O’Dell, Kaela B., Guss, Adam M., Seo, Hyeongmin, Trinh, Cong T., & Hettich, Robert L. Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets. Netherlands. https://doi.org/10.1186/s13068-021-01964-4
Poudel, Suresh, Cope, Alexander L., O’Dell, Kaela B., Guss, Adam M., Seo, Hyeongmin, Trinh, Cong T., and Hettich, Robert L. Mon .
"Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets". Netherlands. https://doi.org/10.1186/s13068-021-01964-4.
@article{osti_1782393,
title = {Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets},
author = {Poudel, Suresh and Cope, Alexander L. and O’Dell, Kaela B. and Guss, Adam M. and Seo, Hyeongmin and Trinh, Cong T. and Hettich, Robert L.},
abstractNote = {Abstract Background Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research. Results We optimized and employed a pipeline integrating various “guilt-by-association” (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum . Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum . In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions. Conclusions This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.},
doi = {10.1186/s13068-021-01964-4},
journal = {Biotechnology for Biofuels},
number = 1,
volume = 14,
place = {Netherlands},
year = {Mon May 10 00:00:00 EDT 2021},
month = {Mon May 10 00:00:00 EDT 2021}
}
https://doi.org/10.1186/s13068-021-01964-4
Works referenced in this record:
Clostridium thermocellum LL1210 pH homeostasis mechanisms informed by transcriptomics and metabolomics
journal, April 2018
- Whitham, Jason M.; Moon, Ji-Won; Rodriguez, Miguel
- Biotechnology for Biofuels, Vol. 11, Issue 1
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
journal, May 2006
- Li, W.; Godzik, A.
- Bioinformatics, Vol. 22, Issue 13
Structural Determinants of the Rate of Protein Evolution in Yeast
journal, June 2006
- Bloom, Jesse D.; Drummond, D. Allan; Arnold, Frances H.
- Molecular Biology and Evolution, Vol. 23, Issue 9
Gene expression of functionally-related genes coevolves across fungal species: detecting coevolution of gene expression using phylogenetic comparative methods
journal, May 2020
- Cope, Alexander L.; O’Meara, Brian C.; Gilchrist, Michael A.
- BMC Genomics, Vol. 21, Issue 1
Elimination of metabolic pathways to all traditional fermentation products increases ethanol yields in Clostridium thermocellum
journal, November 2015
- Papanek, Beth; Biswas, Ranjita; Rydzak, Thomas
- Metabolic Engineering, Vol. 32
Engineering modular ester fermentative pathways in Escherichia coli
journal, November 2014
- Layton, Donovan S.; Trinh, Cong T.
- Metabolic Engineering, Vol. 26
Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles
journal, April 1999
- Pellegrini, M.; Marcotte, E. M.; Thompson, M. J.
- Proceedings of the National Academy of Sciences, Vol. 96, Issue 8
Evolutionary rate covariation reveals shared functionality and coexpression of genes
journal, January 2012
- Clark, N. L.; Alani, E.; Aquadro, C. F.
- Genome Research, Vol. 22, Issue 4
The emergence of Clostridium thermocellum as a high utility candidate for consolidated bioprocessing applications
journal, August 2014
- Akinosho, Hannah; Yee, Kelsey; Close, Dan
- Frontiers in Chemistry, Vol. 2
InterProScan - an integration platform for the signature-recognition methods in InterPro
journal, September 2001
- Zdobnov, E. M.; Apweiler, R.
- Bioinformatics, Vol. 17, Issue 9, p. 847-848
clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters
journal, May 2012
- Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan
- OMICS: A Journal of Integrative Biology, Vol. 16, Issue 5
Coevolution of gene expression among interacting proteins
journal, June 2004
- Fraser, Hunter B.; Hirsh, Aaron E.; Wall, Dennis P.
- Proceedings of the National Academy of Sciences, Vol. 101, Issue 24
SignalP 4.0: discriminating signal peptides from transmembrane regions
journal, September 2011
- Petersen, Thomas Nordahl; Brunak, Søren; von Heijne, Gunnar
- Nature Methods, Vol. 8, Issue 10
“Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks
journal, March 2012
- Gillis, Jesse; Pavlidis, Paul
- PLoS Computational Biology, Vol. 8, Issue 3
Microbial synthesis of a branched-chain ester platform from organic waste carboxylates
journal, December 2016
- Layton, Donovan S.; Trinh, Cong T.
- Metabolic Engineering Communications, Vol. 3
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3
BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences
journal, February 2016
- Kanehisa, Minoru; Sato, Yoko; Morishima, Kanae
- Journal of Molecular Biology, Vol. 428, Issue 4
Proteins of Unknown Function in the Protein Data Bank (PDB): An Inventory of True Uncharacterized Proteins and Computational Tools for Their Analysis
journal, October 2012
- Nadzirin, Nurul; Firdaus-Raih, Mohd
- International Journal of Molecular Sciences, Vol. 13, Issue 12
Prediction of Gene Function by Genome-Scale Expression Analysis: Prostate Cancer-Associated Genes
journal, December 1999
- Walker, M. G.
- Genome Research, Vol. 9, Issue 12
moFF: a robust and automated approach to extract peptide ion intensities
journal, November 2016
- Argentini, Andrea; Goeminne, Ludger J. E.; Verheggen, Kenneth
- Nature Methods, Vol. 13, Issue 12
Why highly expressed proteins evolve slowly
journal, September 2005
- Drummond, D. A.; Bloom, J. D.; Adami, C.
- Proceedings of the National Academy of Sciences, Vol. 102, Issue 40
Phylogenetic models of rate heterogeneity: a high performance computing perspective
conference, January 2006
- Stamatakis, A.
- Proceedings 20th IEEE International Parallel & Distributed Processing Symposium
Annotation of proteins of unknown function: initial enzyme results
journal, January 2015
- McKay, Talia; Hart, Kaitlin; Horn, Alison
- Journal of Structural and Functional Genomics, Vol. 16, Issue 1
The Impact of Multifunctional Genes on "Guilt by Association" Analysis
journal, February 2011
- Gillis, Jesse; Pavlidis, Paul
- PLoS ONE, Vol. 6, Issue 2
Redirecting carbon flux through exogenous pyruvate kinase to achieve high ethanol yields in Clostridium thermocellum
journal, January 2013
- Deng, Yu; Olson, Daniel G.; Zhou, Jilai
- Metabolic Engineering, Vol. 15, p. 151-158
DOOR 2.0: presenting operons and their functions through dynamic and integrated views
journal, November 2013
- Mao, Xizeng; Ma, Qin; Zhou, Chuan
- Nucleic Acids Research, Vol. 42, Issue D1
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
journal, November 2019
- Zhou, Naihui; Jiang, Yuxiang; Bergquist, Timothy R.
- Genome Biology, Vol. 20, Issue 1
High Ethanol Titers from Cellulose by Using Metabolically Engineered Thermophilic, Anaerobic Microbes
journal, September 2011
- Argyros, D. Aaron; Tripathi, Shital A.; Barrett, Trisha F.
- Applied and Environmental Microbiology, Vol. 77, Issue 23, p. 8288-8294
Integrated omics analyses reveal the details of metabolic adaptation of Clostridium thermocellum to lignocellulose-derived growth inhibitors released during the deconstruction of switchgrass
journal, January 2017
- Poudel, Suresh; Giannone, Richard J.; Rodriguez, Miguel
- Biotechnology for Biofuels, Vol. 10, Issue 1
Quantitative assessment of relationship between sequence similarity and function similarity
journal, January 2007
- Joshi, Trupti; Xu, Dong
- BMC Genomics, Vol. 8, Issue 1
OrthoFinder: phylogenetic orthology inference for comparative genomics
journal, November 2019
- Emms, David M.; Kelly, Steven
- Genome Biology, Vol. 20, Issue 1
Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra
journal, September 2011
- Diament, Benjamin J.; Noble, William Stafford
- Journal of Proteome Research, Vol. 10, Issue 9
Semi-supervised learning for peptide identification from shotgun proteomics datasets
journal, October 2007
- Käll, Lukas; Canterbury, Jesse D.; Weston, Jason
- Nature Methods, Vol. 4, Issue 11
Increase in Ethanol Yield via Elimination of Lactate Production in an Ethanol-Tolerant Mutant of Clostridium thermocellum
journal, February 2014
- Biswas, Ranjita; Prabhu, Sandeep; Lynd, Lee R.
- PLoS ONE, Vol. 9, Issue 2
PANNZER2: a rapid functional annotation web server
journal, May 2018
- Törönen, Petri; Medlar, Alan; Holm, Liisa
- Nucleic Acids Research, Vol. 46, Issue W1
Simultaneous achievement of high ethanol yield and titer in Clostridium thermocellum
journal, June 2016
- Tian, Liang; Papanek, Beth; Olson, Daniel G.
- Biotechnology for Biofuels, Vol. 9, Issue 1
A Single Determinant Dominates the Rate of Yeast Protein Evolution
journal, October 2005
- Drummond, D. Allan; Raval, Alpan; Wilke, Claus O.
- Molecular Biology and Evolution, Vol. 23, Issue 2
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
journal, July 2002
- Katoh, K.
- Nucleic Acids Research, Vol. 30, Issue 14
The diversity and specificity of the extracellular proteome in the cellulolytic bacterium Caldicellulosiruptor bescii is driven by the nature of the cellulosic growth substrate
journal, March 2018
- Poudel, Suresh; Giannone, Richard J.; Basen, Mirko
- Biotechnology for Biofuels, Vol. 11, Issue 1
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
journal, September 2016
- Jiang, Yuxiang; Oron, Tal Ronnen; Clark, Wyatt T.
- Genome Biology, Vol. 17, Issue 1
Expanding ester biosynthesis in Escherichia coli
journal, March 2014
- Rodriguez, Gabriel M.; Tashiro, Yohei; Atsumi, Shota
- Nature Chemical Biology, Vol. 10, Issue 4
The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications
journal, January 1987
- Sharp, Paul M.; Li, Wen-Hsiung
- Nucleic Acids Research, Vol. 15, Issue 3, p. 1281-1295
limma powers differential expression analyses for RNA-sequencing and microarray studies
journal, January 2015
- Ritchie, Matthew E.; Phipson, Belinda; Wu, Di
- Nucleic Acids Research, Vol. 43, Issue 7
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
journal, June 2009
- Capella-Gutierrez, S.; Silla-Martinez, J. M.; Gabaldon, T.
- Bioinformatics, Vol. 25, Issue 15
The rapid generation of mutation data matrices from protein sequences
journal, January 1992
- Jones, David T.; Taylor, William R.; Thornton, Janet M.
- Bioinformatics, Vol. 8, Issue 3
Acetyl-CoA and the regulation of metabolism: mechanisms and consequences
journal, April 2015
- Shi, Lei; Tu, Benjamin P.
- Current Opinion in Cell Biology, Vol. 33
The names Hungateiclostridium Zhang et al. 2018, Hungateiclostridium thermocellum (Viljoen et al. 1926) Zhang et al. 2018, Hungateiclostridium cellulolyticum (Patel et al. 1980) Zhang et al. 2018, Hungateiclostridium aldrichii (Yang et al. 1990) Zhang et al. 2018, Hungateiclostridium alkalicellulosi (Zhilina et al. 2006) Zhang et al. 2018, Hungateiclostridium clariflavum (Shiratori et al. 2009) Zhang et al. 2018, Hungateiclostridium straminisolvens (Kato et al. 2004) Zhang et al. 2018 and Hungateiclostridium saccincola (Koeck et al. 2016) Zhang et al. 2018 contravene Rule 51b of the International Code of Nomenclature of Prokaryotes and require replacement names in the genus Acetivibrio Patel et al. 1980
journal, December 2019
- Tindall, B. J.
- International Journal of Systematic and Evolutionary Microbiology, Vol. 69, Issue 12
A genomic update on clostridial phylogeny: Gram-negative spore formers and other misplaced clostridia: Genomics update
journal, July 2013
- Yutin, Natalya; Galperin, Michael Y.
- Environmental Microbiology
Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data
journal, October 2018
- Abu-Jamous, Basel; Kelly, Steven
- Genome Biology, Vol. 19, Issue 1
Transcriptomic and proteomic changes from medium supplementation and strain evolution in high-yielding Clostridium thermocellum strains
journal, November 2018
- Papanek, Beth; O’Dell, Kaela B.; Manga, Punita
- Journal of Industrial Microbiology and Biotechnology, Vol. 45, Issue 11
Expanding the modular ester fermentative pathways for combinatorial biosynthesis of esters from volatile organic acids: Expanding the Modular Ester Fermentative Pathways
journal, February 2016
- Layton, Donovan S.; Trinh, Cong T.
- Biotechnology and Bioengineering, Vol. 113, Issue 8
Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper
journal, April 2017
- Huerta-Cepas, Jaime; Forslund, Kristoffer; Coelho, Luis Pedro
- Molecular Biology and Evolution, Vol. 34, Issue 8
Elimination of hydrogenase active site assembly blocks H2 production and increases ethanol yield in Clostridium thermocellum
journal, January 2015
- Biswas, Ranjita; Zheng, Tianyong; Olson, Daniel G.
- Biotechnology for Biofuels, Vol. 8, Issue 1
The toxicity of recombinant proteins in Escherichia coli: a comparison of overexpression in BL21(DE3), C41(DE3), and C43(DE3)
journal, September 2004
- Dumon-Seignovert, Laurence; Cariot, Guillaume; Vuillard, Laurent
- Protein Expression and Purification, Vol. 37, Issue 1
BLAST+: architecture and applications
journal, January 2009
- Camacho, Christiam; Coulouris, George; Avagyan, Vahram
- BMC Bioinformatics, Vol. 10, Issue 1
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen
journal, January 2001
- Krogh, Anders; Larsson, Björn; von Heijne, Gunnar
- Journal of Molecular Biology, Vol. 305, Issue 3
Link-based quantitative methods to identify differentially coexpressed genes and gene Pairs
journal, January 2011
- Yu, Hui; Liu, Bao-Hong; Ye, Zhi-Qiang
- BMC Bioinformatics, Vol. 12, Issue 1
Basic local alignment search tool
journal, October 1990
- Altschul, Stephen F.; Gish, Warren; Miller, Webb
- Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes
journal, June 2005
- Barker, Daniel; Pagel, Mark
- PLoS Computational Biology, Vol. 1, Issue 1
Biochemical functional predictions for protein structures of unknown or uncertain function
journal, January 2015
- Mills, Caitlyn L.; Beuning, Penny J.; Ondrechen, Mary Jo
- Computational and Structural Biotechnology Journal, Vol. 13
Comparative expression profiling reveals widespread coordinated evolution of gene expression across eukaryotes
journal, November 2018
- Martin, Trevor; Fraser, Hunter B.
- Nature Communications, Vol. 9, Issue 1
Microbial biosynthesis of lactate esters
journal, September 2019
- Lee, Jong-Won; Trinh, Cong T.
- Biotechnology for Biofuels, Vol. 12, Issue 1
Elimination of formate production in Clostridium thermocellum
journal, July 2015
- Rydzak, Thomas; Lynd, Lee R.; Guss, Adam M.
- Journal of Industrial Microbiology & Biotechnology, Vol. 42, Issue 9
Relating Whole-Genome Expression Data with Protein-Protein Interactions
journal, January 2002
- Jansen, R.
- Genome Research, Vol. 12, Issue 1
Single mutation at a highly conserved region of chloramphenicol acetyltransferase enables isobutyl acetate production directly from cellulose by Clostridium thermocellum at elevated temperatures
journal, October 2019
- Seo, Hyeongmin; Lee, Jong-Won; Garcia, Sergio
- Biotechnology for Biofuels, Vol. 12, Issue 1
ggtree : an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data
journal, September 2016
- Yu, Guangchuang; Smith, David K.; Zhu, Huachen
- Methods in Ecology and Evolution, Vol. 8, Issue 1
Aromatic Amino Acid-Derived Compounds Induce Morphological Changes and Modulate the Cell Growth of Wine Yeast Species
journal, April 2018
- González, Beatriz; Vázquez, Jennifer; Cullen, Paul J.
- Frontiers in Microbiology, Vol. 9
Pfam: the protein families database
journal, November 2013
- Finn, Robert D.; Bateman, Alex; Clements, Jody
- Nucleic Acids Research, Vol. 42, Issue D1
Confronting the catalytic dark matter encoded by sequenced genomes
journal, October 2017
- Ellens, Kenneth W.; Christian, Nils; Singh, Charandeep
- Nucleic Acids Research, Vol. 45, Issue 20
Protein Annotation at Genomic Scale: The Current Status
journal, August 2007
- Frishman, Dmitrij
- Chemical Reviews, Vol. 107, Issue 8
A large-scale evaluation of computational protein function prediction
journal, January 2013
- Radivojac, Predrag; Clark, Wyatt T.; Oron, Tal Ronnen
- Nature Methods, Vol. 10, Issue 3
Guilt-by-association goes global
journal, February 2000
- Oliver, Stephen
- Nature, Vol. 403, Issue 6770
Protein abundances are more conserved than mRNA abundances across diverse taxa
journal, November 2010
- Laurent, Jon M.; Vogel, Christine; Kwon, Taejoon
- PROTEOMICS, Vol. 10, Issue 23
Petroclostridium xylanilyticum gen. nov., sp. nov., a xylan-degrading bacterium isolated from an oilfield, and reclassification of clostridial cluster III members into four novel genera in a new Hungateiclostridiaceae fam. nov.
journal, October 2018
- Zhang, Xue; Tu, Bo; Dai, Li-rong
- International Journal of Systematic and Evolutionary Microbiology, Vol. 68, Issue 10
CDD/SPARCLE: functional classification of proteins via subfamily domain architectures
journal, November 2016
- Marchler-Bauer, Aron; Bo, Yu; Han, Lianyi
- Nucleic Acids Research, Vol. 45, Issue D1