Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Massively Multi-core Acceleration of a Document-Similarity Classifier to Detect Web Attacks

Journal Article · · Journal of Parallel and Distributed Computing, vol. 71, no. 2, February 1, 2011, pp. 225-235
OSTI ID:1020352
This paper describes our approach to adapting a text document similarity classifier based on the Term Frequency Inverse Document Frequency (TFIDF) metric to two massively multi-core hardware platforms. The TFIDF classifier is used to detect web attacks in HTTP data. In our parallel hardware approaches, we design streaming, real time classifiers by simplifying the sequential algorithm and manipulating the classifier's model to allow decision information to be represented compactly. Parallel implementations on the Tilera 64-core System on Chip and the Xilinx Virtex 5-LX FPGA are presented. For the Tilera, we employ a reduced state machine to recognize dictionary terms without requiring explicit tokenization, and achieve throughput of 37MB/s at slightly reduced accuracy. For the FPGA, we have developed a set of software tools to help automate the process of converting training data to synthesizable hardware and to provide a means of trading off between accuracy and resource utilization. The Xilinx Virtex 5-LX implementation requires 0.2% of the memory used by the original algorithm. At 166MB/s (80X the software) the hardware implementation is able to achieve Gigabit network throughput at the same accuracy as the original algorithm.
Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
1020352
Report Number(s):
LLNL-JRNL-422668
Journal Information:
Journal of Parallel and Distributed Computing, vol. 71, no. 2, February 1, 2011, pp. 225-235, Journal Name: Journal of Parallel and Distributed Computing, vol. 71, no. 2, February 1, 2011, pp. 225-235 Journal Issue: 2 Vol. 71
Country of Publication:
United States
Language:
English

Similar Records

A configurable-hardware document-similarity classifier to detect web attacks.
Conference · Thu Apr 01 00:00:00 EDT 2010 · OSTI ID:1002067

Virtex-5QV Self Scrubber
Software · Tue Oct 20 00:00:00 EDT 2015 · OSTI ID:1336824

Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAs
Conference · Thu Apr 01 00:00:00 EDT 2021 · OSTI ID:1817512