Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

nGASP - the nematode genome annotation assessment project

Journal Article · · nGASP - the nematode genome annotation assessment project, vol. 9, N/A, December 1, 2008, pp. 549

While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
950645
Report Number(s):
LLNL-JRNL-409564
Journal Information:
nGASP - the nematode genome annotation assessment project, vol. 9, N/A, December 1, 2008, pp. 549, Journal Name: nGASP - the nematode genome annotation assessment project, vol. 9, N/A, December 1, 2008, pp. 549 Vol. 9
Country of Publication:
United States
Language:
English

Similar Records

WormBase in 2022—data, processes, and tools for analyzing Caenorhabditis elegans
Journal Article · Thu Feb 03 23:00:00 EST 2022 · Genetics (Online) · OSTI ID:1870125

Comparative Reannotation of 21 Aspergillus Genomes
Conference · Thu Mar 07 23:00:00 EST 2013 · OSTI ID:1241239

The human gene (CSNK2A1) coding for the casein kinase II subunit [alpha] is located on chromosome 20 and contains tandemly arranged Alu repeats
Journal Article · Fri Jan 14 23:00:00 EST 1994 · Genomics; (United States) · OSTI ID:7160169