Final Report: Weighted Neighbor Data Mining

Carlson, J J; Muguira, M R; Jordan, J B; Flachs, G M; Peterson, A K

doi:10.2172/773910

Title: Final Report: Weighted Neighbor Data Mining

Technical Report · Fri Dec 01 00:00:00 EST 2000

DOI:https://doi.org/10.2172/773910· OSTI ID:773910

Carlson, J J; Muguira, M R; Jordan, J B; Flachs, G M; Peterson, A K

Data mining involves the discovery and fusion of features from large databases to establish minimal probability of error (MPE) decision and estimation models. Our approach combines a weighted nearest neighbor (WNN) decision model for classification and estimation with genetic algorithms (GA) for feature discovery and model optimization. The WNN model is used to provide a mathematical framework for adaptively discovering and fusing features into near-MPE decision algorithms. The GA is used to discover weighted features and select decision points for the WNN decision model to achieve near-MPE decisions. The performance of the WNN fusion model is demonstrated on the first of two very different problems to demonstrate its robust and practical application to a wide variety of data-mining problems. The first problem involves the isolation of factors that cause hepatitis C virus (HCV) and requires the evaluation of large databases to establish the critical features that can detect with minimal error whether a person is at risk of having HCV. This requires discovering and extracting relevant information (features) from a questionnaire database and combining (fusing) them to achieve a minimal error decision rule. The primary objective of the research is to develop a practical basis for fusing information from questionnaires administered at hospitals to identify and verify features important to isolate risk factors for HCV. The basic problem involves creating a feature database from the questionnaire information, discovering features that provide sufficient information to reliably identify when a person is at risk under conditions with uncertainties caused by recording errors and evasive tactics of people answering the questionnaire. The results of this study demonstrate the WNN fusion algorithm ability to perform in supervised learning environments. The second phase of the research project is directed at the unsupervised learning environment. In this environment the feature data is presented without any classification. Clustering algorithms are developed to partition the feature data into clusters based upon similarity measure models. After the feature data is clustered and classified the supervised WNN fusion algorithms are used to classify the data based upon the minimal probability of error decision rule.

View Technical Report

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States)

Sponsoring Organization:: US Department of Energy (US)

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 773910

Report Number(s):: SAND2000-3122; TRN: AH200107%%53

Resource Relation:: Other Information: PBD: 1 Dec 2000

Country of Publication:: United States

Language:: English

Similar Records

Sensor-fusion-based biometric identity verification

Technical Report · Sun Feb 01 00:00:00 EST 1998 · OSTI ID:773910

Carlson, J J; Bouchard, A M; Osbourn, G C; +6 more

Sensor feature fusion for detecting buried objects

Conference · Thu Apr 01 00:00:00 EST 1993 · OSTI ID:773910

Clark, G A; Sengupta, S K; Sherwood, R J; +6 more

Imputation of missing data using machine learning techniques

Conference · Tue Dec 31 00:00:00 EST 1996 · OSTI ID:773910

Lakshminarayan, Kamakshi; Harp, S A; Goldman, R; +1 more

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
ALGORITHMS
CLASSIFICATION
EVALUATION
GENETICS
HEPATITIS
HOSPITALS
LEARNING
OPTIMIZATION
PERFORMANCE
PROBABILITY

Title: Final Report: Weighted Neighbor Data Mining

Citation Formats

Similar Records

Related Subjects