skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Journal Article · · Genome Biology (Online)
 [1];  [2];  [3];  [4];  [5];  [5];  [6];  [7];  [8];  [7];  [9];  [9];  [9];  [10];  [9];  [11];  [11];  [12];  [12];  [12] more »;  [13];  [13];  [13];  [14];  [14];  [15];  [16];  [17];  [18];  [17];  [18];  [17];  [19];  [19];  [19];  [19];  [20];  [20];  [20];  [21];  [21];  [22];  [22];  [22];  [22];  [23];  [23];  [24];  [24];  [25];  [25];  [25];  [16];  [22];  [22];  [16];  [16];  [22];  [16];  [22];  [16];  [16];  [16];  [2];  [26];  [26];  [27];  [22];  [22];  [22];  [22];  [22];  [28];  [29];  [30];  [31];  [32];  [30];  [33];  [34];  [34];  [1];  [35];  [35];  [35];  [35];  [36];  [37];  [38];  [39];  [40];  [40];  [40];  [40];  [41];  [42];  [41];  [41];  [41];  [41];  [41];  [41];  [43];  [44];  [44];  [44];  [44];  [45];  [45];  [45];  [45];  [46];  [47];  [47];  [47];  [48];  [48];  [49];  [49];  [49];  [49];  [49];  [50];  [50];  [51];  [51];  [52];  [52];  [52];  [53];  [54];  [55];  [16];  [16];  [16];  [56];  [57];  [58];  [59];  [11];  [25];  [22];  [35];  [60];  [61]; ORCiD logo [62];  [1] « less
  1. Indiana Univ., Bloomington, IN (United States)
  2. Buck Institute for Research on Aging, Novato, CA (United States)
  3. Yale Univ., New Haven, CT (United States)
  4. Miami University, Oxford, OH (United States)
  5. University of Rome (Italy)
  6. University of Colorado School of Medicine, Aurora, CO (United States)
  7. Colorado State Univ., Fort Collins, CO (United States)
  8. University of Melbourne, Parkville, VIC (Australia)
  9. New York Univ. (NYU), NY (United States)
  10. New York Univ. (NYU), NY (United States); Simons Center for Data Analysis, New York, NY (United States)
  11. Univ. of California, Berkeley, CA (United States)
  12. Univ. of Bologna (Italy)
  13. Univ. of Missouri, Columbia, MO (United States)
  14. Eidgenoessische Technische Hochschule (ETH), Zurich (Switzerland); Swiss Inst. of Bioformatics, Lausanne (Switzerland)
  15. Univ. College London (United Kingdom); Univ. of Lausanne (Switzerland); Swiss Inst. of Bioformatics, Lausanne (Switzerland)
  16. European Bioinformatics Institute, Cambridge (United Kingdom)
  17. Univ. of Turku (Finland)
  18. Univ. of Turku (Finland); Turku Centre for Computer Science (Finland)
  19. Univ. of Bristol (United Kingdom)
  20. Univ. of Helsinki (Finland)
  21. Academia Sinica, Taipei (Taiwan)
  22. Univ. College London (United Kingdom)
  23. North Carolina A & T State Univ., Greensboro, NC (United States)
  24. Purdue Univ., West Lafayette, IN (United States)
  25. Hebrew Univ. of Jerusalem (Israel)
  26. KU Leuven (Belgium); iMinds Department Medical Information Technologies, Leuven (Belgium)
  27. Cancer Research Centre of Lyon (France); Université de Lyon 1, Villeurbanne (France); Centre Léon Bérard, Lyon (France)
  28. Cerenode Inc., Boston, MA (United States)
  29. Molde University College (Norway)
  30. Royal Holloway Univ. of London, Egham (United Kingdom)
  31. Univ. of California, Los Angeles, CA (United States)
  32. National Univ. of Ireland, Galway (Ireland)
  33. Cold Spring Harbor Laboratory Cold Spring Harbor, NY (United States)
  34. Univ. of British Columbia, Vancouver, BC (Canada)
  35. Technische Universität München, Garching (Germany)
  36. USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)
  37. Centre for Genomic Regulation, Barcelona (Spain); Universitat Pompeu Fabra, Barcelona (Spain); Institució Catalana de Recerca i Estudis Avançats, Barcelona (Spain)
  38. Centre for Genomic Regulation, Barcelona (Spain); Universitat Pompeu Fabra, Barcelona (Spain)
  39. Universitat Pompeu Fabra, Barcelona (Spain); Division of Electronics, Rudjer Boskovic Institute, Zagreb (Croatia); EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona (Spain)
  40. Fudan Univ., Shanghai (China)
  41. Univ. of Padua (Italy)
  42. Edmund Mach Foundation, San Michele all’Adige (Italy)
  43. Hospital Universitario de La Paz, Madrid (Spain)
  44. Spanish National Cancer Research Institute, Madrid (Spain)
  45. Politecnico di Torino (Italy)
  46. National University of Computer & Emerging Sciences, Islamabad (Pakistan)
  47. Università degli Studi di Milano (Italy)
  48. Wageningen Univ. and Research Centre (Netherlands)
  49. Univ. of Belgrade (Serbia)
  50. Univ. of Sao Paulo, Ribeirao Preto (Brazil)
  51. Univ. of Würzburg (Germany)
  52. Temple Univ., Philadelphia, PA (United States)
  53. Univ. of Southern Mississippi, Hattiesburg, MS (United States)
  54. Imperial College, London (United Kingdom)
  55. Univ. of Kent (United Kingdom)
  56. Universitätsmedizin Berlin (Germany)
  57. KU Leuven (Belgium)
  58. University of Rome, La Sapienza, Rome, Italy
  59. Univ. of California, San Francisco, CA (United States)
  60. Univ. of Pennsylvania, Philadelphia, PA (United States)
  61. Univ. of Washington, Seattle, WA (United States)
  62. Miami Univ., Oxford, OH (United States)

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); National Science Foundation (NSF); National Institutes of Health (NIH); National Natural Science Foundation of China (NSFC); National Basic Research Program of China; Natural Sciences and Engineering Research Council of Canada (NSERC); FP7 infrastructure project TransPLANT Award; Microsoft Research/FAPESP grant; FAPESP fellowship; Biotechnology and Biological Sciences Research Council; Spanish Ministry of Economics and Competitiveness; Newton International Fellowship Scheme of the Royal Society; Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative; CSC – IT Center for Science Ltd., Espoo, Finland; British Heart Foundation; Parkinson’s UK; Alexander von Humboldt Foundation; German Federal Ministry for Education and Research; Ernst Ludwig Ehrlich Studienwerk; Ministry of Education, Science and Technological Development of the Republic of Serbia; Australian Research Council
Grant/Contract Number:
AC02-05CH11231; DBI-1458477; DBI-1458443; DBI-1458390; DBI-1458359; IIS-1319551; DBI-1262189; DBI-1149224; R01GM093123; R01GM097528; R01GM076990; R01GM071749; R01LM009722; UL1TR000423; 3147124; 91231116; 2012CB316505; RGPIN 371348-11; 283496 (ADJvD); 2009/53161-6; 2010/50491-1; BB/L020505/1; BB/F020481/1; BB/K004131/1; BB/F00964X/1; BB/L018241/1; BIO2012-40205; GBMF4552. RG/13/5/30112; NSF DBI-0965616; DP150101550; DBI-0965768; T15 LM00945102; ICT-2013-612944; FP7; R01 GM60595; CPDA138081/13; GRIC13AAI9; 150654; BB/M015009/1; PRB2 IPT13/0001
OSTI ID:
1626937
Journal Information:
Genome Biology (Online), Vol. 17, Issue 1; ISSN 1474-760X
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 221 works
Citation information provided by
Web of Science

References (31)

An Introduction to the Bootstrap book May 1994
Evolutionary divergence and functions of the ADAM and ADAMTS gene families journal January 2009
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
CAFA and the Open World of protein function predictions journal November 2013
Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA) journal February 2013
Information-theoretic evaluation of predicted ontological annotations journal June 2013
Analysis of protein function and its prediction from amino acid sequence journal April 2011
The Universal Protein Resource (UniProt) journal January 2007
Genomic analyses of a livestock pest, the New World screwworm, find potential targets for genetic control programs journal August 2020
Seeking the Wisdom of Crowds Through Challenge-Based Competitions in Biomedical Research journal February 2013
Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space journal May 2013
An Introduction to the Bootstrap book January 1993
The GOA database: Gene Ontology annotation updates for 2015 journal November 2014
Gene Ontology: tool for the unification of biology journal May 2000
Identification, Characterization, and Intracellular Processing of ADAM-TS12, a Novel Human Disintegrin with a Complex Structural Organization Involving Multiple Thrombospondin-1 Repeats journal March 2001
Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA) text January 2013
The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective journal August 2014
The automated function prediction SIG looks back at 2013 and prepares for 2014 journal March 2014
Computational tools for prioritizing candidate genes: boosting disease gene discovery journal July 2012
ADAM, a Widely Distributed and Developmentally Regulated Gene Family Encoding Membrane Proteins with A̱Ḏisintegrin A̱nd M̱etalloprotease Domain journal May 1995
The Human Phenotype Ontology journal April 2010
Supplementary Data for CAFA2 dataset January 2016
The Universal Protein Resource (UniProt) journal December 2004
A large-scale evaluation of computational protein function prediction journal January 2013
Dietary palmitic acid promotes a prometastatic memory via Schwann cells journal November 2021
High-throughput total RNA sequencing in single cells using VASA-seq journal June 2022
Withdrawal: Identification, characterization, and intracellular processing of ADAM-TS12, a novel human disintegrin with a complex structural organization involving multiple thrombospondin-1 repeats journal January 2019
SANS: high-throughput retrieval of protein sequences allowing 50% mismatches journal September 2012
A combined approach for genome wide protein function annotation/prediction journal January 2013
Biases in the Experimental Annotations of Protein Function and their Effect on Our Understanding of Protein Function Space text January 2013
Supplementary Data for CAFA2 dataset January 2016

Cited By (111)

Machine learning techniques for protein function prediction journal October 2019
Predicting bacterial virulence factors – evaluation of machine learning and negative data strategies journal October 2019
Prioritising candidate genes causing QTL using hierarchical orthologous groups journal September 2018
COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information journal May 2017
NetGO: improving large-scale protein function prediction with massive network information journal May 2019
Neural Network and Random Forest Models in Protein Function Prediction journal January 2020
Novel Comparison of Evaluation Metrics for Gene Ontology Classifiers Reveals Drastic Performance Differences journal September 2019
Gene3D: Extensive prediction of globular domains in proteins journal November 2017
Gene3D: Extensive prediction of globular domains in proteins journal November 2017
CATH: an expanded resource to predict protein function through structure and sequence journal November 2016
Identification of Moonlighting Proteins in Genomes Using Text Mining Techniques journal October 2018
The evolutionary signal in metagenome phyletic profiles predicts many gene functions journal July 2018
Archetypal transcriptional blocks underpin yeast gene regulation in response to changes in growth conditions journal May 2018
Systematic benchmarking of omics computational tools text January 2019
Isoform function prediction based on bi-random walks on a heterogeneous network journal June 2019
The Ortholog Conjecture Revisited: the Value of Orthologs and Paralogs in Function Prediction journal December 2019
PANDA: Protein function prediction using domain architecture and affinity propagation journal February 2018
deepNF: deep network fusion for protein function prediction journal June 2018
Emerging concepts in pseudoenzyme classification, evolution, and signaling journal August 2019
Ensembling Descendant Term Classifiers to Improve Gene - Abnormal Phenotype Predictions book January 2019
HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences journal January 2018
Evaluating the impact of topological protein features on the negative examples selection journal November 2018
AWX: An Integrated Approach to Hierarchical-Multilabel Classification book January 2019
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens journal November 2019
Towards region-specific propagation of protein functions journal October 2018
ADAGE signature analysis: differential expression analysis with data-defined gene sets journal November 2017
ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network journal October 2017
Gene Ontology Meta Annotator for Plants (GOMAP) journal May 2021
GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms journal October 2018
Predicting Human Protein Function with Multi-task Deep Neural Networks journal January 2018
A Structurally-Validated Multiple Sequence Alignment of 497 Human Protein Kinase Domains journal November 2019
Effusion: prediction of protein function from sequence similarity networks journal August 2018
Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper journal April 2017
Accurate and Efficient Gene Function Prediction using a Multi-Bacterial Network journal April 2020
Predicting human protein function with multi-task deep neural networks journal June 2018
ADAGE signature analysis: differential expression analysis with data-defined gene sets journal June 2017
New Drosophila long-term memory genes revealed by assessing computational function prediction methods journal September 2018
Sparsity of Protein-Protein Interaction Networks Hinders Function Prediction in Non-Model Species posted_content July 2020
Accurate and efficient gene function prediction using a Multi-Bacterial network text January 2020
CATH: expanding the horizons of structure-based functional annotations for genome sequences journal November 2018
Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper text January 2017
Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences journal August 2018
BUSCA: an integrative web server to predict subcellular localization of proteins journal April 2018
A Bayesian approach for estimating protein–protein interactions by integrating structural and non-structural biological data journal January 2017
INGA 2.0: improving protein function prediction for the dark proteome journal May 2019
DeepGOPlus: improved protein function prediction from sequence journal July 2019
HFSP: high speed homology-driven function annotation of proteins journal June 2018
Maize GO Annotation-Methods, Evaluation, and Review (maize-GAMER) journal April 2018
Objective risk stratification of prostate cancer using machine learning and radiomics applied to multiparametric magnetic resonance images
  • Varghese, Bino; Chen, Frank; Hwang, Darryl
  • BCB '20: 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3388440.3414208
conference September 2020
Investigating the unknown functions in the minimal bacterial genome reveals many transporter proteins posted_content July 2018
Large-scale protein function prediction using heterogeneous ensembles journal January 2018
Towards region-specific propagation of protein functions journal March 2018
Computational identification of protein-protein interactions in model plant proteomes journal June 2019
New Drosophila Long-Term Memory Genes Revealed by Assessing Computational Function Prediction Methods journal November 2018
Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins journal July 2018
Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks. text January 2020
GROOLS: reactive graph reasoning for genome annotation through biological processes journal April 2018
GROOLS: reactive graph reasoning for genome annotation through biological processes journal October 2017
NNTox: Gene Ontology-Based Protein Toxicity Prediction Using Neural Network journal November 2019
The OMA orthology database in 2018: Retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces text January 2018
Objective risk stratification of prostate cancer using machine learning and radiomics applied to multiparametric magnetic resonance images journal February 2019
Metric learning on expression data for gene function prediction journal September 2019
Identifying gene function and module connections by the integration of multispecies expression compendia journal November 2019
An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach journal December 2019
Gene expression analysis of Cyanophora paradoxa reveals conserved abiotic stress responses between basal algae and flowering plants journal November 2019
Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins journal July 2017
A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations journal January 2019
Systematic benchmarking of omics computational tools journal March 2019
Improving protein function prediction using protein sequence and GO-term similarities journal August 2018
Combining learning and constraints for genome-wide protein annotation journal June 2019
pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion journal February 2018
Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks journal December 2018
An introduction to ROC analysis journal June 2006
Gene expression analysis of Cyanophora paradoxa reveals conserved abiotic stress responses between basal algae and flowering plants journal June 2019
Identifying gene function and module connections by the integration of multi-species expression compendia journal May 2019
DeepGOPlus: Improved protein function prediction from sequence journal April 2019
Artificial Intelligence and Integrated Genotype–Phenotype Identification journal December 2018
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods journal October 2017
Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate journal January 2018
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens journal May 2019
Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks journal July 2019
Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences journal November 2019
Automated feature engineering improves prediction of protein–protein interactions journal July 2019
New computational approaches to understanding molecular protein function journal April 2018
NoGOA: predicting noisy GO annotations using evidences and sparse representation journal July 2017
A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations journal August 2018
Accurate and efficient gene function prediction using a multi-bacterial network journal October 2020
Survey of Machine Learning Techniques in Drug Discovery journal May 2019
deepNF: Deep network fusion for protein function prediction journal November 2017
DeepGOPlus: improved protein function prediction from sequence journal April 2021
The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces journal November 2017
Gut microbiome and magnetic resonance spectroscopy study of subjects at ultra-high risk for psychosis may support the membrane hypothesis journal June 2018
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods text January 2017
A Structurally-Validated Multiple Sequence Alignment of 497 Human Protein Kinase Domains journal December 2019
The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction journal July 2020
Function Prediction for G Protein-Coupled Receptors through Text Mining and Induction Matrix Completion journal February 2019
Constructing Genetic Networks using Biomedical Literature and Rare Event Classification journal November 2017
Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies journal June 2018
ASAP: a machine learning framework for local protein properties journal January 2016
PhytoTypeDB: a database of plant protein inter-cultivar variability and function journal January 2018
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation journal April 2017
PANNZER2: a rapid functional annotation web server journal May 2018
Structural and Functional View of Polypharmacology journal February 2017
CATH functional families predict protein functional sites journal March 2020
NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology journal March 2017
A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks journal October 2018
Resequencing and annotation of the Nostoc punctiforme ATTC 29133 genome: facilitating biofuel and high-value chemical production journal February 2017
Bayesian parameter estimation for automatic annotation of gene functions using observational data and phylogenetic trees journal February 2021
A three-way approach for protein function classification journal February 2017
Multi-task Deep Neural Networks in Automated Protein Function Prediction preprint January 2017
Molecular evolution and gene function preprint January 2019