Language Classification using N-grams Accelerated by FPGA-based Bloom Filters
N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.
- Research Organization:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 925681
- Report Number(s):
- UCRL-CONF-234633; TRN: US200810%%83
- Resource Relation:
- Conference: Presented at: First International Workshop on High-Performance Reconfigurable Computing Technology and Applications, Reno, NV, United States, Nov 11 - Nov 11, 2007
- Country of Publication:
- United States
- Language:
- English
Similar Records
Cache Energy Optimization Techniques For Modern Processors
N-gram-based text categorization