Efficient Aho-Corasick String Matching on Emerging Multicore Architectures

Tumeo, Antonino; Villa, Oreste; Secchi, Simone; Chavarría-Miranda, Daniel

Title: Efficient Aho-Corasick String Matching on Emerging Multicore Architectures

Book · Thu Dec 12 00:00:00 EST 2013

OSTI ID:1182352

Tumeo, Antonino; Villa, Oreste; Secchi, Simone; Chavarría-Miranda, Daniel

String matching algorithms are critical to several scientific fields. Beside text processing and databases, emerging applications such as DNA protein sequence analysis, data mining, information security software, antivirus, ma- chine learning, all exploit string matching algorithms [3]. All these applica- tions usually process large quantity of textual data, require high performance and/or predictable execution times. Among all the string matching algorithms, one of the most studied, especially for text processing and security applica- tions, is the Aho-Corasick algorithm. 1 2 Book title goes here Aho-Corasick is an exact, multi-pattern string matching algorithm which performs the search in a time linearly proportional to the length of the input text independently from pattern set size. However, depending on the imple- mentation, when the number of patterns increase, the memory occupation may raise drastically. In turn, this can lead to significant variability in the performance, due to the memory access times and the caching effects. This is a significant concern for many mission critical applications and modern high performance architectures. For example, security applications such as Network Intrusion Detection Systems (NIDS), must be able to scan network traffic against very large dictionaries in real time. Modern Ethernet links reach up to 10 Gbps, and malicious threats are already well over 1 million, and expo- nentially growing [28]. When performing the search, a NIDS should not slow down the network, or let network packets pass unchecked. Nevertheless, on the current state-of-the-art cache based processors, there may be a large per- formance variability when dealing with big dictionaries and inputs that have different frequencies of matching patterns. In particular, when few patterns are matched and they are all in the cache, the procedure is fast. Instead, when they are not in the cache, often because many patterns are matched and the caches are continuously thrashed, they should be retrieved from the system memory and the procedure is slowed down by the increased latency. Efficient implementations of string matching algorithms have been the fo- cus of several works, targeting Field Programmable Gate Arrays [4, 25, 15, 5], highly multi-threaded solutions like the Cray XMT [34], multicore proces- sors [19] or heterogeneous processors like the Cell Broadband Engine [35, 22]. Recently, several researchers have also started to investigate the use Graphic Processing Units (GPUs) for string matching algorithms in security applica- tions [20, 10, 32, 33]. Most of these approaches mainly focus on reaching high peak performance, or try to optimize the memory occupation, rather than looking at performance stability. However, hardware solutions supports only small dictionary sizes due to lack of memory and are difficult to customize, while platforms such as the Cell/B.E. are very complex to program.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1182352

Report Number(s):: PNNL-SA-84463; 400470000

Resource Relation:: Related Information: Multicore Computing: Algorithms, Architectures, and Applications, 143-170

Country of Publication:: United States

Language:: English

Similar Records

Input-independent, Scalable and Fast String Matching on the Cray XMT

Conference · Mon May 25 00:00:00 EDT 2009 · OSTI ID:1182352

Villa, Oreste; Chavarría-Miranda, Daniel; Maschhoff, Kristyn J

Aho-Corasick String Matching on Shared and Distributed Memory Parallel Architectures

Journal Article · Thu Mar 01 00:00:00 EST 2012 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1182352

Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel

Efficient pattern matching on GPUs for intrusion detection systems

Conference · Mon May 17 00:00:00 EDT 2010 · OSTI ID:1182352

Villa, Oreste; Tumeo, Antonino; Sciuto, Donatella

Title: Efficient Aho-Corasick String Matching on Emerging Multicore Architectures

Citation Formats

Similar Records

Related Subjects