Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Multi-sensor text classification experiments -- a comparison

Technical Report ·
DOI:https://doi.org/10.2172/638201· OSTI ID:638201
 [1]; ;  [2]
  1. Sacred Heart Univ., Fairfield, CT (United States). Dept. of Computer Science and Information Technology
  2. Oak Ridge National Lab., TN (United States)

In this paper, the authors report recent results on automatic classification of free text documents into a given number of categories. The method uses multiple sensors to derive informative clues about patterns of interest in the input text, and fuses this information using a neural network. Encouraging preliminary results were obtained by applying this approach to a set of free text documents from the Associated Press (AP) news wire. New free text documents have been made available by the Reuters news agency. The advantages of this collection compared to the AP data are that the Reuters stories were already manually classified, and included sufficiently high numbers of stories per category. The results indicate the usefulness of the new method: after the network is fully trained, if data belonging to only one category are used for testing, correctness is about 90%, representing nearly 15% over the best results for the AP data. Based on the performance of the method with the AP and the Reuters collections they now have conclusive evidence that the approach is viable and practical. More work remains to be done for handling data belonging to the multiple categories.

Research Organization:
Oak Ridge National Lab., TN (United States)
Sponsoring Organization:
USDOE, Washington, DC (United States)
DOE Contract Number:
AC05-96OR22464
OSTI ID:
638201
Report Number(s):
ORNL/TM--13354; ON: DE98003637
Country of Publication:
United States
Language:
English

Similar Records

Information fusion for automatic text classification
Conference · Thu Aug 01 00:00:00 EDT 1996 · OSTI ID:378178

Neural net learning issues in classification of free text documents
Conference · Thu Feb 29 23:00:00 EST 1996 · OSTI ID:212422

Toward a multi-sensor neural net approach to automatic text classification
Conference · Thu Jan 25 23:00:00 EST 1996 · OSTI ID:266901