Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

Conference ·
OSTI ID:1016000
A new tool, DUK, is developed to perform matching task. Matching is to find whether a query sequence partially or totally matches given reference sequences or not. Matching is similar to alignment. Indeed many traditional analysis tasks like contaminant removal use alignment tools. But for matching, there is no need to know which bases of a query sequence matches which position of a reference sequence, it only need know whether there exists a match or not. This subtle difference can make matching task much faster than alignment. DUK is accurate, versatile, fast, and has efficient memory usage. It uses Kmer hashing method to index reference sequences and Poisson model to calculate p-value. DUK is carefully implemented in C++ in object oriented design. The resulted classes can also be used to develop other tools quickly. DUK have been widely used in JGI for a wide range of applications such as contaminant removal, organelle genome separation, and assembly refinement. Many real applications and simulated dataset demonstrate its power.
Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
Genomics Division
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1016000
Report Number(s):
LBNL-4516E-Poster
Country of Publication:
United States
Language:
English

Similar Records

MerAligner: A Fully Parallel Sequence Aligner
Conference · Fri Jul 17 00:00:00 EDT 2015 · OSTI ID:1524032

Counting Kmers for Biological Sequences at Large Scale
Journal Article · Fri Nov 15 19:00:00 EST 2019 · Interdisciplinary Sciences: Computational Life Sciences · OSTI ID:1756656

An expert system for processing sequence homology data
Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:377164