skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Extending Classification Algorithms to Case-Control Studies

Abstract

Classification is a common technique applied to ’omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated ’omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.

Authors:
ORCiD logo [1];  [1]; ORCiD logo [1]; ORCiD logo [1];  [2]; ORCiD logo [1];  [3];  [1]
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  2. Univ. of Virginia, Charlottesville, VA (United States)
  3. Univ. of Colorado Denver, Aurora, CO (United States)
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
Contributing Org.:
TEDDY Study Group
OSTI Identifier:
1556893
Report Number(s):
PNNL-SA-135302
Journal ID: ISSN 1179-5972
Grant/Contract Number:  
AC05-76RL01830
Resource Type:
Accepted Manuscript
Journal Name:
Biomedical Engineering and Computational Biology
Additional Journal Information:
Journal Volume: 10; Journal ID: ISSN 1179-5972
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Diabetes; machine learning; support vector machines; biomarker discovery; variable selection

Citation Formats

Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, and Webb-Robertson, Bobbie-Jo M. Extending Classification Algorithms to Case-Control Studies. United States: N. p., 2019. Web. doi:10.1177/1179597219858954.
Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, & Webb-Robertson, Bobbie-Jo M. Extending Classification Algorithms to Case-Control Studies. United States. doi:10.1177/1179597219858954.
Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, and Webb-Robertson, Bobbie-Jo M. Mon . "Extending Classification Algorithms to Case-Control Studies". United States. doi:10.1177/1179597219858954. https://www.osti.gov/servlets/purl/1556893.
@article{osti_1556893,
title = {Extending Classification Algorithms to Case-Control Studies},
author = {Stanfill, Bryan A. and Reehl, Sarah M. and Bramer, Lisa M. and Nakayasu, Ernesto S. and Rich, Stephen S. and Metz, Thomas O. and Rewers, Marian and Webb-Robertson, Bobbie-Jo M.},
abstractNote = {Classification is a common technique applied to ’omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated ’omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.},
doi = {10.1177/1179597219858954},
journal = {Biomedical Engineering and Computational Biology},
number = ,
volume = 10,
place = {United States},
year = {2019},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Activation of natural killer T cells by α-galactosylceramide treatment prevents the onset and recurrence of autoimmune Type 1 diabetes
journal, September 2001

  • Sharif, Shayan; Arreaza, Guillermo A.; Zucker, Peter
  • Nature Medicine, Vol. 7, Issue 9
  • DOI: 10.1038/nm0901-1057

Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes
journal, June 2007

  • Todd, John A.; Walker, Neil M.; Cooper, Jason D.
  • Nature Genetics, Vol. 39, Issue 7
  • DOI: 10.1038/ng2068

Defective methionine metabolism in the brain after repeated blast exposures might contribute to increased oxidative stress
journal, January 2018


Why Match? Investigating Matched Case-Control Study Designs with Causal Effect Estimation
journal, January 2009

  • Rose, Sherri; van der Laan, Mark J.
  • The International Journal of Biostatistics, Vol. 5, Issue 1
  • DOI: 10.2202/1557-4679.1127

Regularization Paths for Conditional Logistic Regression: The clogitL1 Package
journal, January 2014

  • Reid, Stephen; Tibshirani, Rob
  • Journal of Statistical Software, Vol. 58, Issue 12
  • DOI: 10.18637/jss.v058.i12

Identification of a panel of sensitive and specific DNA methylation markers for lung adenocarcinoma
journal, January 2007

  • Tsou, Jeffrey A.; Galler, Janice S.; Siegmund, Kimberly D.
  • Molecular Cancer, Vol. 6, Issue 1
  • DOI: 10.1186/1476-4598-6-70

TEDDY-The Environmental Determinants of Diabetes in the Young: An Observational Clinical Trial
journal, October 2006

  • Hagopian, W. A.; Lernmark, A.; Rewers, M. J.
  • Annals of the New York Academy of Sciences, Vol. 1079, Issue 1
  • DOI: 10.1196/annals.1375.049

MissForest--non-parametric missing value imputation for mixed-type data
journal, October 2011


Variable importance in matched case-control studies in settings of high dimensional data
journal, March 2014

  • Balasubramanian, Raji; Andres Houseman, E.; Coull, Brent A.
  • Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 63, Issue 4
  • DOI: 10.1111/rssc.12056

Use of dicarboxylic acids in type 2 diabetes: Dicarboxylic acids for type 2 diabetes
journal, February 2013

  • Mingrone, Geltrude; Castagneto-Gissey, Lidia; Macé, Katherine
  • British Journal of Clinical Pharmacology, Vol. 75, Issue 3
  • DOI: 10.1111/j.1365-2125.2012.04177.x

Downregulation of Long Noncoding RNA Meg3 Affects Insulin Synthesis and Secretion in Mouse Pancreatic Beta Cells: DOWNREGULATION OF LONG NONCODING RNA Meg3
journal, September 2015

  • You, LiangHui; Wang, Ning; Yin, DanDan
  • Journal of Cellular Physiology, Vol. 231, Issue 4
  • DOI: 10.1002/jcp.25175

Improving statistical analysis of matched case-control studies
journal, February 2013

  • Conway, Aaron; Rolley, John X.; Fulbrook, Paul
  • Research in Nursing & Health, Vol. 36, Issue 3
  • DOI: 10.1002/nur.21536

ω-3 polyunsaturated fatty acids ameliorate type 1 diabetes and autoimmunity
journal, April 2017

  • Bi, Xinyun; Li, Fanghong; Liu, Shanshan
  • Journal of Clinical Investigation, Vol. 127, Issue 5
  • DOI: 10.1172/JCI87388

Leucine metabolism in regulation of insulin secretion from pancreatic beta cells: Nutrition Reviews©, Vol. 68, No. 5
journal, April 2010


Second-generation PLINK: rising to the challenge of larger and richer datasets
journal, February 2015


Low vitamin E status is a potential risk factor for insulin-dependent diabetes mellitus
journal, January 1999


Human enterovirus infections in children at increased risk for type 1 diabetes: the Babydiet study
journal, September 2011


Estimation of Multiple Relative risk Functions in Matched Case-Control Studies
journal, October 1978


Omega-3 Polyunsaturated Fatty Acid Intake and Islet Autoimmunity in Children at Increased Risk for Type 1 Diabetes
journal, September 2007


Fatty acid status in infancy is associated with the risk of type 1 diabetes-associated autoimmunity
journal, May 2017


High-dose vitamin E supplementation normalizes retinal blood flow and creatinine clearance in patients with type 1 diabetes
journal, August 1999


kernlab - An S4 Package for Kernel Methods in R
journal, January 2004

  • Karatzoglou, Alexandros; Smola, Alex; Hornik, Kurt
  • Journal of Statistical Software, Vol. 11, Issue 9
  • DOI: 10.18637/jss.v011.i09

Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
journal, April 2015


Biomarker discovery study design for type 1 diabetes in The Environmental Determinants of Diabetes in the Young (TEDDY) study: Biomarker Discovery Study Design
journal, July 2014

  • Lee, Hye-Seung; Burkhardt, Brant R.; McLeod, Wendy
  • Diabetes/Metabolism Research and Reviews, Vol. 30, Issue 5
  • DOI: 10.1002/dmrr.2510

Bayesian Variable Selection Methods for Matched Case-Control Studies
journal, January 2017

  • Asafu-Adjei, Josephine; Tadesse, Mahlet G.; Coull, Brent
  • The International Journal of Biostatistics, Vol. 13, Issue 1
  • DOI: 10.1515/ijb-2016-0043

ranger : A Fast Implementation of Random Forests for High Dimensional Data in C++ and R
journal, January 2017

  • Wright, Marvin N.; Ziegler, Andreas
  • Journal of Statistical Software, Vol. 77, Issue 1
  • DOI: 10.18637/jss.v077.i01

Effect of Oral Sebacic Acid on Postprandial Glycemia, Insulinemia, and Glucose Rate of Appearance in Type 2 Diabetes
journal, August 2010

  • Iaconelli, A.; Gastaldelli, A.; Chiellini, C.
  • Diabetes Care, Vol. 33, Issue 11
  • DOI: 10.2337/dc10-0663

New potential biomarkers in the diagnosis of esophageal squamous cell carcinoma
journal, April 2009


Similarities in Serum Acylcarnitine Patterns in Type 1 and Type 2 Diabetes Mellitus and in Metabolic Syndrome
journal, January 2013

  • Bene, Judit; Márton, Magdolna; Mohás, Márton
  • Annals of Nutrition and Metabolism, Vol. 62, Issue 1
  • DOI: 10.1159/000345759

α-Hydroxybutyric Acid Is a Selective Metabolite Biomarker of Impaired Glucose Tolerance
journal, April 2016

  • Cobb, Jeff; Eckhart, Andrea; Motsinger-Reif, Alison
  • Diabetes Care, Vol. 39, Issue 6
  • DOI: 10.2337/dc15-2752

Anti-CD38 Autoimmunity in Children with Newly Diagnosed Type 1 Diabetes Mellitus
journal, January 2005

  • Pupilli, C.; Antonelli, A.; Iughetti, L.
  • Journal of Pediatric Endocrinology and Metabolism, Vol. 18, Issue 12
  • DOI: 10.1515/JPEM.2005.18.12.1417

α-Hydroxybutyrate Is an Early Biomarker of Insulin Resistance and Glucose Intolerance in a Nondiabetic Population
journal, May 2010


Support-vector networks
journal, September 1995

  • Cortes, Corinna; Vapnik, Vladimir
  • Machine Learning, Vol. 20, Issue 3
  • DOI: 10.1007/BF00994018

Pharmacological Inhibition of Glucosylceramide Synthase Enhances Insulin Sensitivity
journal, February 2007

  • Aerts, J. M.; Ottenhoff, R.; Powlson, A. S.
  • Diabetes, Vol. 56, Issue 5
  • DOI: 10.2337/db06-1619

Boosting for Correlated Binary Classification
journal, January 2010

  • Adewale, Adeniyi J.; Dinu, Irina; Yasui, Yutaka
  • Journal of Computational and Graphical Statistics, Vol. 19, Issue 1
  • DOI: 10.1198/jcgs.2009.07118

Brain lesion classification using 3T MRS spectra and paired SVM kernels
journal, July 2011

  • Dimou, Ioannis; Tsougos, Ioannis; Tsolaki, Evaggelia
  • Biomedical Signal Processing and Control, Vol. 6, Issue 3
  • DOI: 10.1016/j.bspc.2011.01.001

Identification of a panel of sensitive and specific DNA methylation markers for squamous cell lung cancer
journal, January 2008

  • Anglim, Paul P.; Galler, Janice S.; Koss, Michael N.
  • Molecular Cancer, Vol. 7, Issue 1
  • DOI: 10.1186/1476-4598-7-62

Upregulation of lncRNA MEG3 promotes hepatic insulin resistance via increasing FoxO1 expression
journal, January 2016

  • Zhu, Xiang; Wu, Yuan-Bo; Zhou, Jian
  • Biochemical and Biophysical Research Communications, Vol. 469, Issue 2
  • DOI: 10.1016/j.bbrc.2015.11.048

Accuracy of dementia diagnosis--a direct comparison between radiologists and a computerized method
journal, June 2008


Serum α-Tocopherol Concentrations and Risk of Type 1 Diabetes Mellitus: A Cohort Study in Siblings of Affected Children
journal, January 2005

  • Uusitalo, L.; Knip, M.; Kenward, M. G.
  • Journal of Pediatric Endocrinology and Metabolism, Vol. 18, Issue 12
  • DOI: 10.1515/JPEM.2005.18.12.1409

Random Forests
journal, January 2001


Decreased plasma levels of select very long chain ceramide species Are associated with the development of nephropathy in type 1 diabetes
journal, October 2014


    Works referencing / citing this record: