skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Simrank: Rapid and sensitive general-purpose k-mer search tool

Journal Article · · BMC Ecology

Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project (http://nihroadmap.nih.gov/hmp). Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Earth Sciences Division
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
1016705
Report Number(s):
LBNL-4596E; TRN: US201112%%504
Journal Information:
BMC Ecology, Vol. 11, Issue 11; Related Information: Journal Publication Date: 2011
Country of Publication:
United States
Language:
English

Similar Records

A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
Journal Article · Mon Oct 25 00:00:00 EDT 2021 · Computational and Structural Biotechnology Journal · OSTI ID:1016705

IMG/M 4 version of the integrated metagenome comparative analysis system
Journal Article · Wed Oct 16 00:00:00 EDT 2013 · Nucleic Acids Research · OSTI ID:1016705

An optimized FM-index library for nucleotide and amino acid search
Journal Article · Fri Dec 31 00:00:00 EST 2021 · Algorithms for Molecular Biology · OSTI ID:1016705