DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

Li, Mingkun; Copeland, Alex; Han, James

DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

Conference · Mon Mar 21 04:00:00 EDT 2011

OSTI ID:1016000

Li, Mingkun; Copeland, Alex; Han, James

A new tool, DUK, is developed to perform matching task. Matching is to find whether a query sequence partially or totally matches given reference sequences or not. Matching is similar to alignment. Indeed many traditional analysis tasks like contaminant removal use alignment tools. But for matching, there is no need to know which bases of a query sequence matches which position of a reference sequence, it only need know whether there exists a match or not. This subtle difference can make matching task much faster than alignment. DUK is accurate, versatile, fast, and has efficient memory usage. It uses Kmer hashing method to index reference sequences and Poisson model to calculate p-value. DUK is carefully implemented in C++ in object oriented design. The resulted classes can also be used to develop other tools quickly. DUK have been widely used in JGI for a wide range of applications such as contaminant removal, organelle genome separation, and assembly refinement. Many real applications and simulated dataset demonstrate its power.

Research Organization:: Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)

Sponsoring Organization:: Genomics Division

DOE Contract Number:: AC02-05CH11231

OSTI ID:: 1016000

Report Number(s):: LBNL-4516E-Poster

Country of Publication:: United States

Language:: English

Similar Records

MerAligner: A Fully Parallel Sequence Aligner

Conference · Fri Jul 17 00:00:00 EDT 2015 · OSTI ID:1524032

Counting Kmers for Biological Sequences at Large Scale

Journal Article · Fri Nov 15 19:00:00 EST 2019 · Interdisciplinary Sciences: Computational Life Sciences · OSTI ID:1756656

An expert system for processing sequence homology data

Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:377164

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
ALIGNMENT
CELL CONSTITUENTS
DESIGN
REMOVAL

DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

Citation Formats

Similar Records

Related Subjects