skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

Conference ·
OSTI ID:1015333

Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Genomics Division
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
1015333
Report Number(s):
LBNL-4404E-Poster; TRN: US201111%%550
Resource Relation:
Conference: DOE JGI User Meeting
Country of Publication:
United States
Language:
English

Similar Records

deBGR: an efficient and near-exact representation of the weighted de Bruijn graph
Journal Article · Wed Jul 12 00:00:00 EDT 2017 · Bioinformatics · OSTI ID:1015333

Enabling Graph Appliance for Genome Assembly
Conference · Thu Jan 01 00:00:00 EST 2015 · OSTI ID:1015333

PaKman: A Scalable Algorithm for Generating Genomic Contigs on Distributed Memory Machines
Journal Article · Sat May 01 00:00:00 EDT 2021 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1015333