DUK - A Fast and Efficient Kmer Based Sequence Matching Tool
Conference
·
OSTI ID:1016000
A new tool, DUK, is developed to perform matching task. Matching is to find whether a query sequence partially or totally matches given reference sequences or not. Matching is similar to alignment. Indeed many traditional analysis tasks like contaminant removal use alignment tools. But for matching, there is no need to know which bases of a query sequence matches which position of a reference sequence, it only need know whether there exists a match or not. This subtle difference can make matching task much faster than alignment. DUK is accurate, versatile, fast, and has efficient memory usage. It uses Kmer hashing method to index reference sequences and Poisson model to calculate p-value. DUK is carefully implemented in C++ in object oriented design. The resulted classes can also be used to develop other tools quickly. DUK have been widely used in JGI for a wide range of applications such as contaminant removal, organelle genome separation, and assembly refinement. Many real applications and simulated dataset demonstrate its power.
- Research Organization:
- Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
- Sponsoring Organization:
- Genomics Division
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1016000
- Report Number(s):
- LBNL-4516E-Poster
- Country of Publication:
- United States
- Language:
- English
Similar Records
MerAligner: A Fully Parallel Sequence Aligner
Counting Kmers for Biological Sequences at Large Scale
An expert system for processing sequence homology data
Conference
·
Fri Jul 17 00:00:00 EDT 2015
·
OSTI ID:1524032
Counting Kmers for Biological Sequences at Large Scale
Journal Article
·
Fri Nov 15 19:00:00 EST 2019
· Interdisciplinary Sciences: Computational Life Sciences
·
OSTI ID:1756656
An expert system for processing sequence homology data
Technical Report
·
Fri Dec 30 23:00:00 EST 1994
·
OSTI ID:377164