skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores

Abstract

Functional annotation of metagenomic and metatranscriptomic data sets relies on similarity searches based on e-value thresholds resulting in an unknown number of false positive and negative matches. To overcome these limitations, we introduce ROCker, aimed at identifying position-specific, most-discriminant thresholds in sliding windows along the sequence of a target protein, accounting for non-discriminative domains shared by unrelated proteins. ROCker employs the receiver operating characteristic (ROC) curve to minimize false discovery rate (FDR) and calculate the best thresholds based on how simulated shotgun metagenomic reads of known composition map onto well-curated reference protein sequences and thus, differs from HMM profiles and related methods. We showcase ROCker using ammonia monooxygenase (amoA) and nitrous oxide reductase (nosZ) genes, mediating oxidation of ammonia and the reduction of the potent greenhouse gas, N 2O, to inert N 2, respectively. ROCker typically showed 60-fold lower FDR when compared to the common practice of using fixed e-values. Previously uncounted ‘atypical’ nosZ genes were found to be two times more abundant, on average, than their typical counterparts in most soil metagenomes and the abundance of bacterial amoA was quantified against the highly-related particulate methane monooxygenase (pmoA). Therefore, ROCker can reliably detect and quantify target genes in short-read metagenomes.

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [3]
  1. Georgia Inst. of Technology, Atlanta, GA (United States). School of Civil and Environmental Engineering
  2. Georgia Inst. of Technology, Atlanta, GA (United States). Center for Bioinformatics and Computational Genomics. School of Biological Sciences
  3. Georgia Inst. of Technology, Atlanta, GA (United States). School of Civil and Environmental Engineering. Center for Bioinformatics and Computational Genomics. School of Biological Sciences
Publication Date:
Research Org.:
Georgia Inst. of Technology, Atlanta, GA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23); National Science Foundation (NSF)
OSTI Identifier:
1356171
Alternate Identifier(s):
OSTI ID: 1362281
Grant/Contract Number:  
SC0006662; 1241046; 1356288
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Nucleic Acids Research
Additional Journal Information:
Journal Volume: 45; Journal Issue: 3; Journal ID: ISSN 0305-1048
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 60 APPLIED LIFE SCIENCES; 97 MATHEMATICS AND COMPUTING; genes; soil; false-positive results; candidate disease gene; datasets

Citation Formats

Orellana, Luis H., Rodriguez-R, Luis M., and Konstantinidis, Konstantinos T.. ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores. United States: N. p., 2016. Web. doi:10.1093/nar/gkw900.
Orellana, Luis H., Rodriguez-R, Luis M., & Konstantinidis, Konstantinos T.. ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores. United States. doi:10.1093/nar/gkw900.
Orellana, Luis H., Rodriguez-R, Luis M., and Konstantinidis, Konstantinos T.. Fri . "ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores". United States. doi:10.1093/nar/gkw900. https://www.osti.gov/servlets/purl/1356171.
@article{osti_1356171,
title = {ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores},
author = {Orellana, Luis H. and Rodriguez-R, Luis M. and Konstantinidis, Konstantinos T.},
abstractNote = {Functional annotation of metagenomic and metatranscriptomic data sets relies on similarity searches based on e-value thresholds resulting in an unknown number of false positive and negative matches. To overcome these limitations, we introduce ROCker, aimed at identifying position-specific, most-discriminant thresholds in sliding windows along the sequence of a target protein, accounting for non-discriminative domains shared by unrelated proteins. ROCker employs the receiver operating characteristic (ROC) curve to minimize false discovery rate (FDR) and calculate the best thresholds based on how simulated shotgun metagenomic reads of known composition map onto well-curated reference protein sequences and thus, differs from HMM profiles and related methods. We showcase ROCker using ammonia monooxygenase (amoA) and nitrous oxide reductase (nosZ) genes, mediating oxidation of ammonia and the reduction of the potent greenhouse gas, N2O, to inert N2, respectively. ROCker typically showed 60-fold lower FDR when compared to the common practice of using fixed e-values. Previously uncounted ‘atypical’ nosZ genes were found to be two times more abundant, on average, than their typical counterparts in most soil metagenomes and the abundance of bacterial amoA was quantified against the highly-related particulate methane monooxygenase (pmoA). Therefore, ROCker can reliably detect and quantify target genes in short-read metagenomes.},
doi = {10.1093/nar/gkw900},
journal = {Nucleic Acids Research},
issn = {0305-1048},
number = 3,
volume = 45,
place = {United States},
year = {2016},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share: