skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Characterizing Large Text Corpora Using a Maximum Variation Sampling Genetic Algorithm

Abstract

An enormous amount of information available via the Internet exists. Much of this data is in the form of text-based documents. These documents cover a variety of topics that are vitally important to the scientific, business, and defense/security communities. Currently, there are a many techniques for processing and analyzing such data. However, the ability to quickly characterize a large set of documents still proves challenging. Previous work has successfully demonstrated the use of a genetic algorithm for providing a representative subset for text documents via adaptive sampling. In this work, we further expand and explore this approach on much larger data sets using a parallel Genetic Algorithm (GA) with adaptive parameter control. Experimental results are presented and discussed.

Authors:
 [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
Work for Others (WFO)
OSTI Identifier:
931452
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: Genetic and Evolutionary Computation Conference, Seattle, WA, USA, 20060708, 20060712
Country of Publication:
United States
Language:
English

Citation Formats

Patton, Robert M. Characterizing Large Text Corpora Using a Maximum Variation Sampling Genetic Algorithm. United States: N. p., 2006. Web. doi:10.1145/1143997.1144308.
Patton, Robert M. Characterizing Large Text Corpora Using a Maximum Variation Sampling Genetic Algorithm. United States. doi:10.1145/1143997.1144308.
Patton, Robert M. Sun . "Characterizing Large Text Corpora Using a Maximum Variation Sampling Genetic Algorithm". United States. doi:10.1145/1143997.1144308.
@article{osti_931452,
title = {Characterizing Large Text Corpora Using a Maximum Variation Sampling Genetic Algorithm},
author = {Patton, Robert M},
abstractNote = {An enormous amount of information available via the Internet exists. Much of this data is in the form of text-based documents. These documents cover a variety of topics that are vitally important to the scientific, business, and defense/security communities. Currently, there are a many techniques for processing and analyzing such data. However, the ability to quickly characterize a large set of documents still proves challenging. Previous work has successfully demonstrated the use of a genetic algorithm for providing a representative subset for text documents via adaptive sampling. In this work, we further expand and explore this approach on much larger data sets using a parallel Genetic Algorithm (GA) with adaptive parameter control. Experimental results are presented and discussed.},
doi = {10.1145/1143997.1144308},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Jan 01 00:00:00 EST 2006},
month = {Sun Jan 01 00:00:00 EST 2006}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: