skip to main content

SciTech ConnectSciTech Connect

Title: Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the set of predictors to around 8 factors that can be validated using reputable medical and public health resources.
Authors:
; ; ;
Publication Date:
OSTI Identifier:
1237825
Report Number(s):
PNNL-SA-94471
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 2013 World Congress in Computer Science, Computer Engineering, and Applied Computing (WORLDCOMP'13), July 22-25, 2013, Las Vegas, Nevada
Publisher:
CSREA Press, Athens, GA, United States(US).
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English