Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching

Book ·
OSTI ID:1092670
DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data, which needs to be matched against exponentially growing databases of known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also include heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variability, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. In this paper, we discuss the implementation of the Aho-Corasick algorithm for GPU-accelerated high performance systems. We present an optimized implementation of Aho-Corasick for GPUs and discuss its tradeoffs on the Tesla T10 and he new Tesla T20 (codename Fermi) GPUs. We then integrate the optimized GPU code, respectively, in a MPI-based and in a pthreads-based load balancer to enable execution of the algorithm on clusters and large sharedmemory multiprocessors (SMPs) accelerated with multiple GPUs.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1092670
Report Number(s):
PNNL-SA-86496; 400470000
Country of Publication:
United States
Language:
English

Similar Records

Accelerating DNA analysis applications on GPU clusters
Conference · Sun Jun 13 00:00:00 EDT 2010 · OSTI ID:986273

Experiences with string matching on the Fermi Architecture
Conference · Thu Feb 24 23:00:00 EST 2011 · OSTI ID:1023200

Efficient pattern matching on GPUs for intrusion detection systems
Conference · Mon May 17 00:00:00 EDT 2010 · OSTI ID:986274

Related Subjects