Automatic document classification based on expert human decisions. [Subject categorization based on occurrence of manually assigned descriptors]
Conference
·
OSTI ID:7289976
Machine formation and augmentation of technical information libraries is a provocative and immediately useful application of artificial intelligence. Automatic entry, organization, and retrieval of bibliographic references, and the translation en masse of entire collections of such references from one classification system to another, pose a well-defined problem calling for associative processing of contextually linked quanta of information. Measured human expertise in these tasks provides both a yardstick for machine performance and an initial core upon which a dynamic knowledge base can be built for the machine, using incremental learning techniques. In order to identify parameters affecting this type of classification task, several variations of conditional probabilistic and discriminant-based automatic document classification programs were constructed, and their performance against manual human document entry over several technical information data bases in the energy field were compared. With statistical correspondence between machine and manually generated classifications over fairly extensive document sets as the prime performance yardstick, in a 35 category selection environment, systems emulated human classification decisions with 60 to 94% accuracy; in binary decisions, a simulation accuracy of up to 97% was achieved. Through analysis of the algorithms that performed at various levels within this range, and, in particular, through analysis of common characteristics of the document cases in which correspondence failed, it has been possible to isolate several key parameters that regulate performance in systems of this sort and to gain some understanding of the mechanisms used by human experts in performing these types of classification task.
- Research Organization:
- California Univ., Berkeley (USA). Lawrence Berkeley Lab.
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 7289976
- Report Number(s):
- LBL-6164; CONF-770815-1
- Country of Publication:
- United States
- Language:
- English
Similar Records
Automatic database mapping and translation methods. [Automatic category assignment based on descriptors]
DOE Energy Data Base: input magnetic tape description
The Energy data base: Subject coverage, Literature Coverage, Data Elements, and Indexing Practices
Conference
·
Fri Mar 31 23:00:00 EST 1978
·
OSTI ID:6747933
DOE Energy Data Base: input magnetic tape description
Technical Report
·
Tue Jun 01 00:00:00 EDT 1982
·
OSTI ID:6869447
The Energy data base: Subject coverage, Literature Coverage, Data Elements, and Indexing Practices
Technical Report
·
Mon Nov 30 23:00:00 EST 1981
·
OSTI ID:6690808