skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

Abstract

Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results onmore » memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less

Authors:
; ; ; ; ; ; ; ; ; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
Genomics Division
OSTI Identifier:
1015333
Report Number(s):
LBNL-4404E-Poster
TRN: US201111%%550
DOE Contract Number:  
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: DOE JGI User Meeting
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; ALGORITHMS; COMPUTER ARCHITECTURE; MEMORY MANAGEMENT; METRICS; NUCLEOTIDES; PERFORMANCE; PROGRAMMING

Citation Formats

Sczyrba, Alex, Pratap, Abhishek, Canon, Shane, Han, James, Copeland, Alex, Wang, Zhong, Brewer, Tony, Soper, David, D'Jamoos, Mike, Collins, Kirby, and Vacek, George. Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture. United States: N. p., 2011. Web.
Sczyrba, Alex, Pratap, Abhishek, Canon, Shane, Han, James, Copeland, Alex, Wang, Zhong, Brewer, Tony, Soper, David, D'Jamoos, Mike, Collins, Kirby, & Vacek, George. Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture. United States.
Sczyrba, Alex, Pratap, Abhishek, Canon, Shane, Han, James, Copeland, Alex, Wang, Zhong, Brewer, Tony, Soper, David, D'Jamoos, Mike, Collins, Kirby, and Vacek, George. Tue . "Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture". United States. https://www.osti.gov/servlets/purl/1015333.
@article{osti_1015333,
title = {Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture},
author = {Sczyrba, Alex and Pratap, Abhishek and Canon, Shane and Han, James and Copeland, Alex and Wang, Zhong and Brewer, Tony and Soper, David and D'Jamoos, Mike and Collins, Kirby and Vacek, George},
abstractNote = {Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2011},
month = {3}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: