DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome

Journal Article · · BMC Genomics
 [1];  [2]
  1. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
  2. Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
2351080
Journal Information:
BMC Genomics, Journal Name: BMC Genomics Journal Issue: 1 Vol. 25; ISSN 1471-2164
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (54)

Scoring function for automated assessment of protein structure template quality journal January 2004
Structure is three to ten times more conserved than sequence-A study of structural response in protein cores journal November 2009
Assessment of template based protein structure predictions in CASP9 journal January 2011
Depicting a protein's two faces: GPCR classification by phylogenetic tree‐based HMMs journal October 2003
Lignin valorization by bacterial genus Pseudomonas: State-of-the-art review and prospects journal January 2021
Eliminating a global regulator of carbon catabolite repression enhances the conversion of aromatic lignin monomers to muconate in Pseudomonas putida KT2440 journal December 2017
Pseudomonas putida as a functional chassis for industrial biocatalysis: From native biochemistry to trans-metabolism journal November 2018
Improving Proteomics Data Reproducibility with a Dual-Search Strategy journal December 2019
Ultrafast Peptide Label-Free Quantification with FlashLFQ journal November 2017
Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units journal August 2019
Gene Ontology: tool for the unification of biology journal May 2000
A large-scale evaluation of computational protein function prediction journal January 2013
Fast and sensitive protein alignment using DIAMOND journal November 2014
Announcing the worldwide Protein Data Bank journal December 2003
Environmental conditions shape the nature of a minimal bacterial genome journal July 2019
Array programming with NumPy journal September 2020
Highly accurate protein structure prediction with AlphaFold journal July 2021
SignalP 5.0 improves signal peptide predictions using deep neural networks journal February 2019
Philosopher: a versatile toolkit for shotgun proteomics data analysis journal July 2020
Bioprocess development for muconic acid production from aromatic compounds and lignin journal January 2018
The shikimate pathway: gateway to metabolic diversity journal January 2024
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles journal September 2005
Lignin valorization through integrated biological funneling and chemical catalysis journal August 2014
Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction journal November 2016
The BioCyc collection of microbial genomes and metabolic pathways journal August 2017
ProteoWizard: open source software for rapid proteomics tools development journal July 2008
lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests journal August 2013
RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference journal May 2019
The Gene Ontology knowledgebase in 2023 journal March 2023
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability journal January 2013
The Protein Data Bank journal January 2000
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models journal November 2021
NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information journal May 2021
UniProt: the Universal Protein Knowledgebase in 2023 journal November 2022
The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes journal October 2004
TM-align: a protein structure alignment algorithm based on the TM-score journal April 2005
Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database journal November 2015
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets journal November 2018
NetGO: improving large-scale protein function prediction with massive network information journal May 2019
An assessment of genome annotation coverage across the bacterial tree of life journal March 2020
Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences journal August 2000
Equivalence between modularity optimization and maximum likelihood methods for community detection journal November 2016
Fifty-five years of enzyme classification: advances and difficulties journal October 2013
Proteinortho: Detection of (Co-)orthologs in large-scale analysis journal April 2011
Homology-based inference sets the bar high for protein function prediction journal February 2013
Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes journal January 2004
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens journal November 2019
The Extinction Dynamics of Bacterial Pseudogenes journal August 2010
RUPEE: A fast and accurate purely geometric protein structure search journal March 2019
Stan : A Probabilistic Programming Language journal January 2017
Exploring Network Structure, Dynamics, and Function using NetworkX conference June 2008
Finding New Cell Wall Regulatory Genes in Populus trichocarpa Using Multiple Lines of Evidence journal October 2019
ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees collection January 2018
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens collection January 2019