skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Language Classification using N-grams Accelerated by FPGA-based Bloom Filters

Conference ·

N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
925681
Report Number(s):
UCRL-CONF-234633; TRN: US200810%%83
Resource Relation:
Conference: Presented at: First International Workshop on High-Performance Reconfigurable Computing Technology and Applications, Reno, NV, United States, Nov 11 - Nov 11, 2007
Country of Publication:
United States
Language:
English

Similar Records

Storage-Intensive Supercomputing Benchmark Study
Technical Report · Tue Oct 30 00:00:00 EDT 2007 · OSTI ID:925681

Cache Energy Optimization Techniques For Modern Processors
Book · Tue Jan 01 00:00:00 EST 2013 · OSTI ID:925681

N-gram-based text categorization
Technical Report · Sat Dec 31 00:00:00 EST 1994 · OSTI ID:925681