Metagenomic gene annotation by a homology-independent approach
Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMER but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- Genomics Division
- DOE Contract Number:
- DE-AC02-05CH11231
- OSTI ID:
- 1050667
- Report Number(s):
- LBNL-4833E-Poster; TRN: US201218%%879
- Resource Relation:
- Conference: 6th Annual DOE JGI User Meeting , Walnut Creek, CA, 3/22 - 3/24/2011
- Country of Publication:
- United States
- Language:
- English
Similar Records
Improvement of eukaryotic protein predictions from soil metagenomes
Comparative fecal metagenomics unveils unique functional capacity of the swine gut
Related Subjects
60 APPLIED LIFE SCIENCES
97 MATHEMATICAL METHODS AND COMPUTING
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ACCURACY
CARBONIC ANHYDRASE
CELLULASE
COWS
CULTIVATION
FUNCTIONALS
GENES
GENETICS
MUTATIONS
PIPELINES
PROTEINS
RUMINANTS
STOMACH
VELOCITY
functional annotation
deep metagenome sequencing
rhModeller
homology
metagenomic gene annotation
missense
frameshift mutations