skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: FLOCKING-BASED DOCUMENT CLUSTERING ON THE GRAPHICS PROCESSING UNIT [Book Chapter]

Abstract

Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the fl ocking behavior of birds. Each bird represents a single document and fl ies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly diffi cult to receive results in a reasonable amount of time. However, fl ocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have experienced improved performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefi t the GPU brings to all naturally inspired algorithms.more » Using the CUDA platform from NVIDIA®, we developed a document fl ocking implementation to be run on the NVIDIA® GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3,000 documents. The results of these tests were very signifi cant. Performance gains ranged from three to nearly fi ve times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.« less

Authors:
; ; ;
Publication Date:
Research Org.:
DOESC (USDOE Office of Science (SC) (United States))
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1052106
Resource Type:
Book
Resource Relation:
Related Information: Book Chapter, Nature Inspired Cooperative Strategies for Optimization (NICSO 2007), Volume 129 of the series Studies in Computational Intelligence pp 27-37
Country of Publication:
United States
Language:
English

Citation Formats

Charles, J S, Patton, R M, Potok, T E, and Cui, X. FLOCKING-BASED DOCUMENT CLUSTERING ON THE GRAPHICS PROCESSING UNIT [Book Chapter]. United States: N. p., 2008. Web. doi:10.1007/978-3-540-78987-1_3.
Charles, J S, Patton, R M, Potok, T E, & Cui, X. FLOCKING-BASED DOCUMENT CLUSTERING ON THE GRAPHICS PROCESSING UNIT [Book Chapter]. United States. https://doi.org/10.1007/978-3-540-78987-1_3
Charles, J S, Patton, R M, Potok, T E, and Cui, X. 2008. "FLOCKING-BASED DOCUMENT CLUSTERING ON THE GRAPHICS PROCESSING UNIT [Book Chapter]". United States. https://doi.org/10.1007/978-3-540-78987-1_3. https://www.osti.gov/servlets/purl/1052106.
@article{osti_1052106,
title = {FLOCKING-BASED DOCUMENT CLUSTERING ON THE GRAPHICS PROCESSING UNIT [Book Chapter]},
author = {Charles, J S and Patton, R M and Potok, T E and Cui, X},
abstractNote = {Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the fl ocking behavior of birds. Each bird represents a single document and fl ies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly diffi cult to receive results in a reasonable amount of time. However, fl ocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have experienced improved performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefi t the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NVIDIA®, we developed a document fl ocking implementation to be run on the NVIDIA® GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3,000 documents. The results of these tests were very signifi cant. Performance gains ranged from three to nearly fi ve times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.},
doi = {10.1007/978-3-540-78987-1_3},
url = {https://www.osti.gov/biblio/1052106}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Jan 01 00:00:00 EST 2008},
month = {Tue Jan 01 00:00:00 EST 2008}
}

Book:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this book.

Save / Share: