An expanded evaluation of protein function prediction methods shows an improvement in accuracy
- Indiana Univ., Bloomington, IN (United States)
- Buck Institute for Research on Aging, Novato, CA (United States)
- Yale Univ., New Haven, CT (United States)
- Miami University, Oxford, OH (United States)
- University of Rome (Italy)
- University of Colorado School of Medicine, Aurora, CO (United States)
- Colorado State Univ., Fort Collins, CO (United States)
- University of Melbourne, Parkville, VIC (Australia)
- New York Univ. (NYU), NY (United States)
- New York Univ. (NYU), NY (United States); Simons Center for Data Analysis, New York, NY (United States)
- Univ. of California, Berkeley, CA (United States)
- Univ. of Bologna (Italy)
- Univ. of Missouri, Columbia, MO (United States)
- Eidgenoessische Technische Hochschule (ETH), Zurich (Switzerland); Swiss Inst. of Bioformatics, Lausanne (Switzerland)
- Univ. College London (United Kingdom); Univ. of Lausanne (Switzerland); Swiss Inst. of Bioformatics, Lausanne (Switzerland)
- European Bioinformatics Institute, Cambridge (United Kingdom)
- Univ. of Turku (Finland)
- Univ. of Turku (Finland); Turku Centre for Computer Science (Finland)
- Univ. of Bristol (United Kingdom)
- Univ. of Helsinki (Finland)
- Academia Sinica, Taipei (Taiwan)
- Univ. College London (United Kingdom)
- North Carolina A & T State Univ., Greensboro, NC (United States)
- Purdue Univ., West Lafayette, IN (United States)
- Hebrew Univ. of Jerusalem (Israel)
- KU Leuven (Belgium); iMinds Department Medical Information Technologies, Leuven (Belgium)
- Cancer Research Centre of Lyon (France); Université de Lyon 1, Villeurbanne (France); Centre Léon Bérard, Lyon (France)
- Cerenode Inc., Boston, MA (United States)
- Molde University College (Norway)
- Royal Holloway Univ. of London, Egham (United Kingdom)
- Univ. of California, Los Angeles, CA (United States)
- National Univ. of Ireland, Galway (Ireland)
- Cold Spring Harbor Laboratory Cold Spring Harbor, NY (United States)
- Univ. of British Columbia, Vancouver, BC (Canada)
- Technische Universität München, Garching (Germany)
- USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)
- Centre for Genomic Regulation, Barcelona (Spain); Universitat Pompeu Fabra, Barcelona (Spain); Institució Catalana de Recerca i Estudis Avançats, Barcelona (Spain)
- Centre for Genomic Regulation, Barcelona (Spain); Universitat Pompeu Fabra, Barcelona (Spain)
- Universitat Pompeu Fabra, Barcelona (Spain); Division of Electronics, Rudjer Boskovic Institute, Zagreb (Croatia); EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona (Spain)
- Fudan Univ., Shanghai (China)
- Univ. of Padua (Italy)
- Edmund Mach Foundation, San Michele all’Adige (Italy)
- Hospital Universitario de La Paz, Madrid (Spain)
- Spanish National Cancer Research Institute, Madrid (Spain)
- Politecnico di Torino (Italy)
- National University of Computer & Emerging Sciences, Islamabad (Pakistan)
- Università degli Studi di Milano (Italy)
- Wageningen Univ. and Research Centre (Netherlands)
- Univ. of Belgrade (Serbia)
- Univ. of Sao Paulo, Ribeirao Preto (Brazil)
- Univ. of Würzburg (Germany)
- Temple Univ., Philadelphia, PA (United States)
- Univ. of Southern Mississippi, Hattiesburg, MS (United States)
- Imperial College, London (United Kingdom)
- Univ. of Kent (United Kingdom)
- Universitätsmedizin Berlin (Germany)
- KU Leuven (Belgium)
- University of Rome, La Sapienza, Rome, Italy
- Univ. of California, San Francisco, CA (United States)
- Univ. of Pennsylvania, Philadelphia, PA (United States)
- Univ. of Washington, Seattle, WA (United States)
- Miami Univ., Oxford, OH (United States)
Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER); National Science Foundation (NSF); National Institutes of Health (NIH); National Natural Science Foundation of China (NSFC); National Basic Research Program of China; Natural Sciences and Engineering Research Council of Canada (NSERC); FP7 infrastructure project TransPLANT Award; Microsoft Research/FAPESP grant; FAPESP fellowship; Biotechnology and Biological Sciences Research Council; Spanish Ministry of Economics and Competitiveness; Newton International Fellowship Scheme of the Royal Society; Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative; CSC – IT Center for Science Ltd., Espoo, Finland; British Heart Foundation; Parkinson’s UK; Alexander von Humboldt Foundation; German Federal Ministry for Education and Research; Ernst Ludwig Ehrlich Studienwerk; Ministry of Education, Science and Technological Development of the Republic of Serbia; Australian Research Council
- Grant/Contract Number:
- AC02-05CH11231; DBI-1458477; DBI-1458443; DBI-1458390; DBI-1458359; IIS-1319551; DBI-1262189; DBI-1149224; R01GM093123; R01GM097528; R01GM076990; R01GM071749; R01LM009722; UL1TR000423; 3147124; 91231116; 2012CB316505; RGPIN 371348-11; 283496 (ADJvD); 2009/53161-6; 2010/50491-1; BB/L020505/1; BB/F020481/1; BB/K004131/1; BB/F00964X/1; BB/L018241/1; BIO2012-40205; GBMF4552. RG/13/5/30112; NSF DBI-0965616; DP150101550; DBI-0965768; T15 LM00945102; ICT-2013-612944; FP7; R01 GM60595; CPDA138081/13; GRIC13AAI9; 150654; BB/M015009/1; PRB2 IPT13/0001
- OSTI ID:
- 1626937
- Journal Information:
- Genome Biology (Online), Vol. 17, Issue 1; ISSN 1474-760X
- Publisher:
- BioMed CentralCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Similar Records
Standardized benchmarking in the quest for orthologs
A unified catalog of 204,938 reference genomes from the human gut microbiome