Language Classification using N-grams Accelerated by FPGA-based Bloom Filters

Jacob, A; Gokhale, M

doi:10.1145/1328554.1328564

Title: Language Classification using N-grams Accelerated by FPGA-based Bloom Filters

Conference · Thu Sep 13 00:00:00 EDT 2007

DOI:https://doi.org/10.1145/1328554.1328564· OSTI ID:925681

Jacob, A; Gokhale, M

N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.

View Conference

Cite

Export

Save

Research Organization:: Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: W-7405-ENG-48

OSTI ID:: 925681

Report Number(s):: UCRL-CONF-234633; TRN: US200810%%83

Resource Relation:: Conference: Presented at: First International Workshop on High-Performance Reconfigurable Computing Technology and Applications, Reno, NV, United States, Nov 11 - Nov 11, 2007

Country of Publication:: United States

Language:: English

Similar Records

Storage-Intensive Supercomputing Benchmark Study

Technical Report · Tue Oct 30 00:00:00 EDT 2007 · OSTI ID:925681

Cohen, J; Dossa, D; Gokhale, M; +4 more

Cache Energy Optimization Techniques For Modern Processors

Book · Tue Jan 01 00:00:00 EST 2013 · OSTI ID:925681

Mittal, Sparsh

N-gram-based text categorization

Technical Report · Sat Dec 31 00:00:00 EST 1994 · OSTI ID:925681

Cavnar, W B; Trenkle, J M

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
CLASSIFICATION
DESIGN
IMPLEMENTATION
PROCESSING

Title: Language Classification using N-grams Accelerated by FPGA-based Bloom Filters

Citation Formats

Similar Records

Related Subjects