skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Extending Classification Algorithms to Case-Control Studies

Abstract

Classification is a common technique applied to ’omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated ’omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.

Authors:
ORCiD logo [1];  [1]; ORCiD logo [1]; ORCiD logo [1];  [2]; ORCiD logo [1];  [3];  [1]
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  2. Univ. of Virginia, Charlottesville, VA (United States)
  3. Univ. of Colorado Denver, Aurora, CO (United States)
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
Contributing Org.:
TEDDY Study Group
OSTI Identifier:
1556893
Report Number(s):
PNNL-SA-135302
Journal ID: ISSN 1179-5972
Grant/Contract Number:  
AC05-76RL01830
Resource Type:
Accepted Manuscript
Journal Name:
Biomedical Engineering and Computational Biology
Additional Journal Information:
Journal Volume: 10; Journal ID: ISSN 1179-5972
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Diabetes; machine learning; support vector machines; biomarker discovery; variable selection

Citation Formats

Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, and Webb-Robertson, Bobbie-Jo M. Extending Classification Algorithms to Case-Control Studies. United States: N. p., 2019. Web. doi:10.1177/1179597219858954.
Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, & Webb-Robertson, Bobbie-Jo M. Extending Classification Algorithms to Case-Control Studies. United States. doi:10.1177/1179597219858954.
Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, and Webb-Robertson, Bobbie-Jo M. Mon . "Extending Classification Algorithms to Case-Control Studies". United States. doi:10.1177/1179597219858954. https://www.osti.gov/servlets/purl/1556893.
@article{osti_1556893,
title = {Extending Classification Algorithms to Case-Control Studies},
author = {Stanfill, Bryan A. and Reehl, Sarah M. and Bramer, Lisa M. and Nakayasu, Ernesto S. and Rich, Stephen S. and Metz, Thomas O. and Rewers, Marian and Webb-Robertson, Bobbie-Jo M.},
abstractNote = {Classification is a common technique applied to ’omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated ’omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.},
doi = {10.1177/1179597219858954},
journal = {Biomedical Engineering and Computational Biology},
number = ,
volume = 10,
place = {United States},
year = {2019},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Random Forests
journal, January 2001