Extending Classification Algorithms to Case-Control Studies
Abstract
Classification is a common technique applied to ’omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated ’omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.
- Authors:
-
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Univ. of Virginia, Charlottesville, VA (United States)
- Univ. of Colorado Denver, Aurora, CO (United States)
- Publication Date:
- Research Org.:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- Contributing Org.:
- TEDDY Study Group
- OSTI Identifier:
- 1556893
- Report Number(s):
- PNNL-SA-135302
Journal ID: ISSN 1179-5972
- Grant/Contract Number:
- AC05-76RL01830
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Biomedical Engineering and Computational Biology
- Additional Journal Information:
- Journal Volume: 10; Journal ID: ISSN 1179-5972
- Publisher:
- SAGE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Diabetes; machine learning; support vector machines; biomarker discovery; variable selection
Citation Formats
Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, and Webb-Robertson, Bobbie-Jo M.. Extending Classification Algorithms to Case-Control Studies. United States: N. p., 2019.
Web. doi:10.1177/1179597219858954.
Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, & Webb-Robertson, Bobbie-Jo M.. Extending Classification Algorithms to Case-Control Studies. United States. https://doi.org/10.1177/1179597219858954
Stanfill, Bryan A., Reehl, Sarah M., Bramer, Lisa M., Nakayasu, Ernesto S., Rich, Stephen S., Metz, Thomas O., Rewers, Marian, and Webb-Robertson, Bobbie-Jo M.. Mon .
"Extending Classification Algorithms to Case-Control Studies". United States. https://doi.org/10.1177/1179597219858954. https://www.osti.gov/servlets/purl/1556893.
@article{osti_1556893,
title = {Extending Classification Algorithms to Case-Control Studies},
author = {Stanfill, Bryan A. and Reehl, Sarah M. and Bramer, Lisa M. and Nakayasu, Ernesto S. and Rich, Stephen S. and Metz, Thomas O. and Rewers, Marian and Webb-Robertson, Bobbie-Jo M.},
abstractNote = {Classification is a common technique applied to ’omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated ’omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.},
doi = {10.1177/1179597219858954},
journal = {Biomedical Engineering and Computational Biology},
number = ,
volume = 10,
place = {United States},
year = {2019},
month = {7}
}
Works referenced in this record:
Activation of natural killer T cells by α-galactosylceramide treatment prevents the onset and recurrence of autoimmune Type 1 diabetes
journal, September 2001
- Sharif, Shayan; Arreaza, Guillermo A.; Zucker, Peter
- Nature Medicine, Vol. 7, Issue 9
Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes
journal, June 2007
- Todd, John A.; Walker, Neil M.; Cooper, Jason D.
- Nature Genetics, Vol. 39, Issue 7
Defective methionine metabolism in the brain after repeated blast exposures might contribute to increased oxidative stress
journal, January 2018
- Arun, Peethambaran; Rittase, William B.; Wilder, Donna M.
- Neurochemistry International, Vol. 112
Autoantibody Response to CD38 in Caucasian Patients With Type 1 and Type 2 Diabetes: Immunological and Genetic Characterization
journal, April 2001
- Mallone, R.; Ortolan, E.; Baj, G.
- Diabetes, Vol. 50, Issue 4
Why Match? Investigating Matched Case-Control Study Designs with Causal Effect Estimation
journal, January 2009
- Rose, Sherri; van der Laan, Mark J.
- The International Journal of Biostatistics, Vol. 5, Issue 1
Regularization Paths for Conditional Logistic Regression: The clogitL1 Package
journal, January 2014
- Reid, Stephen; Tibshirani, Rob
- Journal of Statistical Software, Vol. 58, Issue 12
Identification of a panel of sensitive and specific DNA methylation markers for lung adenocarcinoma
journal, January 2007
- Tsou, Jeffrey A.; Galler, Janice S.; Siegmund, Kimberly D.
- Molecular Cancer, Vol. 6, Issue 1
TEDDY-The Environmental Determinants of Diabetes in the Young: An Observational Clinical Trial
journal, October 2006
- Hagopian, W. A.; Lernmark, A.; Rewers, M. J.
- Annals of the New York Academy of Sciences, Vol. 1079, Issue 1
MissForest--non-parametric missing value imputation for mixed-type data
journal, October 2011
- Stekhoven, D. J.; Buhlmann, P.
- Bioinformatics, Vol. 28, Issue 1
Variable importance in matched case-control studies in settings of high dimensional data
journal, March 2014
- Balasubramanian, Raji; Andres Houseman, E.; Coull, Brent A.
- Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 63, Issue 4
Use of dicarboxylic acids in type 2 diabetes: Dicarboxylic acids for type 2 diabetes
journal, February 2013
- Mingrone, Geltrude; Castagneto-Gissey, Lidia; Macé, Katherine
- British Journal of Clinical Pharmacology, Vol. 75, Issue 3
Synthesis and characterisation of galactosyl glycerol by β-galactosidase catalysed reverse hydrolysis of galactose and glycerol
journal, December 2013
- Wei, Wei; Qi, Danping; Zhao, Hai-zhen
- Food Chemistry, Vol. 141, Issue 3
Downregulation of Long Noncoding RNA Meg3 Affects Insulin Synthesis and Secretion in Mouse Pancreatic Beta Cells: DOWNREGULATION OF LONG NONCODING RNA Meg3
journal, September 2015
- You, LiangHui; Wang, Ning; Yin, DanDan
- Journal of Cellular Physiology, Vol. 231, Issue 4
Improving statistical analysis of matched case-control studies
journal, February 2013
- Conway, Aaron; Rolley, John X.; Fulbrook, Paul
- Research in Nursing & Health, Vol. 36, Issue 3
Leucine metabolism in regulation of insulin secretion from pancreatic beta cells: Nutrition Reviews©, Vol. 68, No. 5
journal, April 2010
- Yang, Jichun; Chi, Yujing; Burkhardt, Brant R.
- Nutrition Reviews, Vol. 68, Issue 5
Second-generation PLINK: rising to the challenge of larger and richer datasets
journal, February 2015
- Chang, Christopher C.; Chow, Carson C.; Tellier, Laurent CAM
- GigaScience, Vol. 4, Issue 1
Low vitamin E status is a potential risk factor for insulin-dependent diabetes mellitus
journal, January 1999
- Knekt, P.; Reunanen, A.; Marniemi, J.
- Journal of Internal Medicine, Vol. 245, Issue 1
Variable selection and prediction using a nested, matched case-control study: Application to hospital acquired pneumonia in stroke patients: Variable Selection and Prediction Using a Nested, Matched Case-Control Study
journal, December 2013
- Qian, Jing; Payabvash, Seyedmehdi; Kemmling, André
- Biometrics, Vol. 70, Issue 1
Human enterovirus infections in children at increased risk for type 1 diabetes: the Babydiet study
journal, September 2011
- Simonen-Tikka, M. -L.; Pflueger, M.; Klemola, P.
- Diabetologia, Vol. 54, Issue 12
Estimation of Multiple Relative risk Functions in Matched Case-Control Studies
journal, October 1978
- Breslow, N. E.; Day, N. E.; Halvorsen, K. T.
- American Journal of Epidemiology, Vol. 108, Issue 4
Omega-3 Polyunsaturated Fatty Acid Intake and Islet Autoimmunity in Children at Increased Risk for Type 1 Diabetes
journal, September 2007
- Norris, Jill M.; Yin, Xiang; Lamb, Molly M.
- JAMA, Vol. 298, Issue 12
Fatty acid status in infancy is associated with the risk of type 1 diabetes-associated autoimmunity
journal, May 2017
- Niinistö, Sari; Takkinen, Hanna-Mari; Erlund, Iris
- Diabetologia, Vol. 60, Issue 7
High-dose vitamin E supplementation normalizes retinal blood flow and creatinine clearance in patients with type 1 diabetes
journal, August 1999
- Bursell, S. E.; Clermont, A. C.; Aiello, L. P.
- Diabetes Care, Vol. 22, Issue 8
kernlab - An S4 Package for Kernel Methods in R
journal, January 2004
- Karatzoglou, Alexandros; Smola, Alex; Hornik, Kurt
- Journal of Statistical Software, Vol. 11, Issue 9
Osmotic stress in Synechocystis sp. PCC 6803: low tolerance towards nonionic osmotic stress results from lacking activation of glucosylglycerol accumulation
journal, July 2006
- Marin, K.
- Microbiology, Vol. 152, Issue 7
Serum α- and γ-tocopherol concentrations and risk of advanced beta cell autoimmunity in children with HLA-conferred susceptibility to type 1 diabetes mellitus
journal, March 2008
- Uusitalo, L.; Nevalainen, J.; Niinistö, S.
- Diabetologia, Vol. 51, Issue 5
Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
journal, April 2015
- Avalos, Marta; Pouyes, Hélène; Grandvalet, Yves
- BMC Bioinformatics, Vol. 16, Issue S6
Biomarker discovery study design for type 1 diabetes in The Environmental Determinants of Diabetes in the Young (TEDDY) study: Biomarker Discovery Study Design
journal, July 2014
- Lee, Hye-Seung; Burkhardt, Brant R.; McLeod, Wendy
- Diabetes/Metabolism Research and Reviews, Vol. 30, Issue 5
Bayesian Variable Selection Methods for Matched Case-Control Studies
journal, January 2017
- Asafu-Adjei, Josephine; Tadesse, Mahlet G.; Coull, Brent
- The International Journal of Biostatistics, Vol. 13, Issue 1
ranger : A Fast Implementation of Random Forests for High Dimensional Data in C++ and R
journal, January 2017
- Wright, Marvin N.; Ziegler, Andreas
- Journal of Statistical Software, Vol. 77, Issue 1
Effect of Oral Sebacic Acid on Postprandial Glycemia, Insulinemia, and Glucose Rate of Appearance in Type 2 Diabetes
journal, August 2010
- Iaconelli, A.; Gastaldelli, A.; Chiellini, C.
- Diabetes Care, Vol. 33, Issue 11
New potential biomarkers in the diagnosis of esophageal squamous cell carcinoma
journal, April 2009
- Xu, Shu-Yong; Liu, Zan; Ma, Wen-Jing
- Biomarkers, Vol. 14, Issue 5
Similarities in Serum Acylcarnitine Patterns in Type 1 and Type 2 Diabetes Mellitus and in Metabolic Syndrome
journal, January 2013
- Bene, Judit; Márton, Magdolna; Mohás, Márton
- Annals of Nutrition and Metabolism, Vol. 62, Issue 1
α-Hydroxybutyric Acid Is a Selective Metabolite Biomarker of Impaired Glucose Tolerance
journal, April 2016
- Cobb, Jeff; Eckhart, Andrea; Motsinger-Reif, Alison
- Diabetes Care, Vol. 39, Issue 6
α-Hydroxybutyrate Is an Early Biomarker of Insulin Resistance and Glucose Intolerance in a Nondiabetic Population
journal, May 2010
- Gall, Walter E.; Beebe, Kirk; Lawton, Kay A.
- PLoS ONE, Vol. 5, Issue 5
Support-vector networks
journal, September 1995
- Cortes, Corinna; Vapnik, Vladimir
- Machine Learning, Vol. 20, Issue 3
Bayesian analysis of pair-matched case-control studies subject to outcome misclassification: Analysis of matched case-control studies under misclassification
journal, August 2017
- Högg, Tanja; Petkau, John; Zhao, Yinshan
- Statistics in Medicine, Vol. 36, Issue 26
Pharmacological Inhibition of Glucosylceramide Synthase Enhances Insulin Sensitivity
journal, February 2007
- Aerts, J. M.; Ottenhoff, R.; Powlson, A. S.
- Diabetes, Vol. 56, Issue 5
Boosting for Correlated Binary Classification
journal, January 2010
- Adewale, Adeniyi J.; Dinu, Irina; Yasui, Yutaka
- Journal of Computational and Graphical Statistics, Vol. 19, Issue 1
Brain lesion classification using 3T MRS spectra and paired SVM kernels
journal, July 2011
- Dimou, Ioannis; Tsougos, Ioannis; Tsolaki, Evaggelia
- Biomedical Signal Processing and Control, Vol. 6, Issue 3
Identification of a panel of sensitive and specific DNA methylation markers for squamous cell lung cancer
journal, January 2008
- Anglim, Paul P.; Galler, Janice S.; Koss, Michael N.
- Molecular Cancer, Vol. 7, Issue 1
Upregulation of lncRNA MEG3 promotes hepatic insulin resistance via increasing FoxO1 expression
journal, January 2016
- Zhu, Xiang; Wu, Yuan-Bo; Zhou, Jian
- Biochemical and Biophysical Research Communications, Vol. 469, Issue 2
Accuracy of dementia diagnosis--a direct comparison between radiologists and a computerized method
journal, June 2008
- Kloppel, S.; Stonnington, C. M.; Barnes, J.
- Brain, Vol. 131, Issue 11
Serum α-Tocopherol Concentrations and Risk of Type 1 Diabetes Mellitus: A Cohort Study in Siblings of Affected Children
journal, January 2005
- Uusitalo, L.; Knip, M.; Kenward, M. G.
- Journal of Pediatric Endocrinology and Metabolism, Vol. 18, Issue 12
Decreased plasma levels of select very long chain ceramide species Are associated with the development of nephropathy in type 1 diabetes
journal, October 2014
- Klein, Richard L.; Hammad, Samar M.; Baker, Nathaniel L.
- Metabolism, Vol. 63, Issue 10
miRNALoc: predicting miRNA subcellular localizations based on principal component scores of physico-chemical properties and pseudo compositions of di-nucleotides
journal, September 2020
- Meher, Prabina Kumar; Satpathy, Subhrajit; Rao, Atmakuri Ramakrishna
- Scientific Reports, Vol. 10, Issue 1
Bayesian Variable Selection Methods for Matched Case-Control Studies
text, January 2017
- Rebecca, Betensky,; Brent, Coull,; Michael, Lev,
- The University of North Carolina at Chapel Hill University Libraries
Second-generation PLINK: rising to the challenge of larger and richer datasets
text, January 2014
- Chang, Christopher C.; Chow, Carson C.; Tellier, Laurent C. A. M.
- arXiv
ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R
text, January 2015
- Wright, Marvin N.; Ziegler, Andreas
- arXiv
Improving statistical analysis of matched case-control studies
journal, February 2013
- Conway, Aaron; Rolley, John X.; Fulbrook, Paul
- Research in Nursing & Health, Vol. 36, Issue 3
Human enterovirus infections in children at increased risk for type 1 diabetes: the Babydiet study
journal, September 2011
- Simonen-Tikka, M. -L.; Pflueger, M.; Klemola, P.
- Diabetologia, Vol. 54, Issue 12
Decreased plasma levels of select very long chain ceramide species Are associated with the development of nephropathy in type 1 diabetes
journal, October 2014
- Klein, Richard L.; Hammad, Samar M.; Baker, Nathaniel L.
- Metabolism, Vol. 63, Issue 10
Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes
journal, June 2007
- Todd, John A.; Walker, Neil M.; Cooper, Jason D.
- Nature Genetics, Vol. 39, Issue 7
New potential biomarkers in the diagnosis of esophageal squamous cell carcinoma
journal, April 2009
- Xu, Shu-Yong; Liu, Zan; Ma, Wen-Jing
- Biomarkers, Vol. 14, Issue 5
MissForest--non-parametric missing value imputation for mixed-type data
journal, October 2011
- Stekhoven, D. J.; Buhlmann, P.
- Bioinformatics, Vol. 28, Issue 1
Accuracy of dementia diagnosis--a direct comparison between radiologists and a computerized method
journal, June 2008
- Kloppel, S.; Stonnington, C. M.; Barnes, J.
- Brain, Vol. 131, Issue 11
Similarities in Serum Acylcarnitine Patterns in Type 1 and Type 2 Diabetes Mellitus and in Metabolic Syndrome
journal, January 2013
- Bene, Judit; Márton, Magdolna; Mohás, Márton
- Annals of Nutrition and Metabolism, Vol. 62, Issue 1
ω-3 polyunsaturated fatty acids ameliorate type 1 diabetes and autoimmunity
journal, April 2017
- Bi, Xinyun; Li, Fanghong; Liu, Shanshan
- Journal of Clinical Investigation, Vol. 127, Issue 5
Identification of a panel of sensitive and specific DNA methylation markers for squamous cell lung cancer
journal, January 2008
- Anglim, Paul P.; Galler, Janice S.; Koss, Michael N.
- Molecular Cancer, Vol. 7, Issue 1
Second-generation PLINK: rising to the challenge of larger and richer datasets
journal, February 2015
- Chang, Christopher C.; Chow, Carson C.; Tellier, Laurent CAM
- GigaScience, Vol. 4, Issue 1
TEDDY-The Environmental Determinants of Diabetes in the Young: An Observational Clinical Trial
journal, October 2006
- Hagopian, W. A.; Lernmark, A.; Rewers, M. J.
- Annals of the New York Academy of Sciences, Vol. 1079, Issue 1
α-Hydroxybutyrate Is an Early Biomarker of Insulin Resistance and Glucose Intolerance in a Nondiabetic Population
journal, May 2010
- Gall, Walter E.; Beebe, Kirk; Lawton, Kay A.
- PLoS ONE, Vol. 5, Issue 5
Anti-CD38 Autoimmunity in Children with Newly Diagnosed Type 1 Diabetes Mellitus
journal, January 2005
- Pupilli, C.; Antonelli, A.; Iughetti, L.
- Journal of Pediatric Endocrinology and Metabolism, Vol. 18, Issue 12
Effect of Oral Sebacic Acid on Postprandial Glycemia, Insulinemia, and Glucose Rate of Appearance in Type 2 Diabetes
journal, August 2010
- Iaconelli, A.; Gastaldelli, A.; Chiellini, C.
- Diabetes Care, Vol. 33, Issue 11
α-Hydroxybutyric Acid Is a Selective Metabolite Biomarker of Impaired Glucose Tolerance
journal, April 2016
- Cobb, Jeff; Eckhart, Andrea; Motsinger-Reif, Alison
- Diabetes Care, Vol. 39, Issue 6
High-dose vitamin E supplementation normalizes retinal blood flow and creatinine clearance in patients with type 1 diabetes
journal, August 1999
- Bursell, S. E.; Clermont, A. C.; Aiello, L. P.
- Diabetes Care, Vol. 22, Issue 8