Multi-sensor text classification experiments -- a comparison
- Sacred Heart Univ., Fairfield, CT (United States). Dept. of Computer Science and Information Technology
- Oak Ridge National Lab., TN (United States)
In this paper, the authors report recent results on automatic classification of free text documents into a given number of categories. The method uses multiple sensors to derive informative clues about patterns of interest in the input text, and fuses this information using a neural network. Encouraging preliminary results were obtained by applying this approach to a set of free text documents from the Associated Press (AP) news wire. New free text documents have been made available by the Reuters news agency. The advantages of this collection compared to the AP data are that the Reuters stories were already manually classified, and included sufficiently high numbers of stories per category. The results indicate the usefulness of the new method: after the network is fully trained, if data belonging to only one category are used for testing, correctness is about 90%, representing nearly 15% over the best results for the AP data. Based on the performance of the method with the AP and the Reuters collections they now have conclusive evidence that the approach is viable and practical. More work remains to be done for handling data belonging to the multiple categories.
- Research Organization:
- Oak Ridge National Lab., TN (United States)
- Sponsoring Organization:
- USDOE, Washington, DC (United States)
- DOE Contract Number:
- AC05-96OR22464
- OSTI ID:
- 638201
- Report Number(s):
- ORNL/TM--13354; ON: DE98003637
- Country of Publication:
- United States
- Language:
- English
Similar Records
Neural net learning issues in classification of free text documents
Toward a multi-sensor neural net approach to automatic text classification