Simrank: Rapid and sensitive general-purpose k-mer search tool

DeSantis, T Z; Keller, K; Karaoz, U; Alekseyenko, A V; Singh, N N.S.; Brodie, E L; Pei, Z; Andersen, G L; Larsen, N

doi:10.1186/1472-6785-11-11

Title: Simrank: Rapid and sensitive general-purpose k-mer search tool

Journal Article · Fri Apr 01 00:00:00 EDT 2011 · BMC Ecology

DOI:https://doi.org/10.1186/1472-6785-11-11· OSTI ID:1016705

DeSantis, T Z; Keller, K; Karaoz, U; Alekseyenko, A V; Singh, N N.S.; Brodie, E L; Pei, Z; Andersen, G L; Larsen, N

Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project (http://nihroadmap.nih.gov/hmp). Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

View Journal Article

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: Earth Sciences Division

DOE Contract Number:: DE-AC02-05CH11231

OSTI ID:: 1016705

Report Number(s):: LBNL-4596E; TRN: US201112%%504

Journal Information:: BMC Ecology, Vol. 11, Issue 11; Related Information: Journal Publication Date: 2011

Country of Publication:: United States

Language:: English

Similar Records

A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes

Journal Article · Mon Oct 25 00:00:00 EDT 2021 · Computational and Structural Biotechnology Journal · OSTI ID:1016705

Garcia, Benjamin J.; Simha, Ramanuja; Garvin, Michael; +7 more

IMG/M 4 version of the integrated metagenome comparative analysis system

Journal Article · Wed Oct 16 00:00:00 EDT 2013 · Nucleic Acids Research · OSTI ID:1016705

Markowitz, Victor M.; Chen, I-Min A.; Chu, Ken; +15 more

An optimized FM-index library for nucleotide and amino acid search

Journal Article · Fri Dec 31 00:00:00 EST 2021 · Algorithms for Molecular Biology · OSTI ID:1016705

Anderson, Tim; Wheeler, Travis J.

Related Subjects

54 ENVIRONMENTAL SCIENCES
58 GEOSCIENCES
ALGORITHMS
CLASSIFICATION
COMPUTER CALCULATIONS
DATA ANALYSIS
DNA
GRAPH THEORY
INFORMATION RETRIEVAL
PERFORMANCE TESTING
PROTEINS
RNA

Title: Simrank: Rapid and sensitive general-purpose k-mer search tool

Citation Formats

Similar Records

Related Subjects