skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data

Abstract

Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similarmore » when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.« less

Authors:
 [1]; ORCiD logo [2];  [3];  [4];  [4];  [3];  [1]; ORCiD logo [3]; ORCiD logo [5]
  1. Brookhaven National Lab. (BNL), Upton, NY (United States)
  2. Yonsei Univ. College of Medicine, Seoul (Korea)
  3. National Health Insurance Service Ilsan Hospital, Goyang (Korea)
  4. Columbia Univ., New York, NY (United States)
  5. Columbia Univ., New York, NY (United States); Seoul National Univ. (Korea)
Publication Date:
Research Org.:
Brookhaven National Lab. (BNL), Upton, NY (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21); Seoul National University; NHS Ilsan Hospital Research Support Program; National Institutes of Health (NIH); Brain Behavior Research Foundation Young Investigator Award; Korean Scientists and Engineers Association Young Investigator Grant; National Research Foundation of Korea (NRF); Ministry of Science
OSTI Identifier:
1618403
Report Number(s):
BNL-215918-2020-JAAM
Journal ID: ISSN 2398-6352
Grant/Contract Number:  
SC0012704; K01-MH109836
Resource Type:
Accepted Manuscript
Journal Name:
npj Digital Medicine
Additional Journal Information:
Journal Volume: 3; Journal Issue: 1; Journal ID: ISSN 2398-6352
Publisher:
Springer Nature
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Alzheimer's disease; predictive markers

Citation Formats

Park, Ji Hwan, Cho, Han Eol, Kim, Jong Hun, Wall, Melanie M., Stern, Yaakov, Lim, Hyunsun, Yoo, Shinjae, Kim, Hyoung Seop, and Cha, Jiook. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. United States: N. p., 2020. Web. doi:10.1038/s41746-020-0256-0.
Park, Ji Hwan, Cho, Han Eol, Kim, Jong Hun, Wall, Melanie M., Stern, Yaakov, Lim, Hyunsun, Yoo, Shinjae, Kim, Hyoung Seop, & Cha, Jiook. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. United States. doi:https://doi.org/10.1038/s41746-020-0256-0
Park, Ji Hwan, Cho, Han Eol, Kim, Jong Hun, Wall, Melanie M., Stern, Yaakov, Lim, Hyunsun, Yoo, Shinjae, Kim, Hyoung Seop, and Cha, Jiook. Thu . "Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data". United States. doi:https://doi.org/10.1038/s41746-020-0256-0. https://www.osti.gov/servlets/purl/1618403.
@article{osti_1618403,
title = {Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data},
author = {Park, Ji Hwan and Cho, Han Eol and Kim, Jong Hun and Wall, Melanie M. and Stern, Yaakov and Lim, Hyunsun and Yoo, Shinjae and Kim, Hyoung Seop and Cha, Jiook},
abstractNote = {Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.},
doi = {10.1038/s41746-020-0256-0},
journal = {npj Digital Medicine},
number = 1,
volume = 3,
place = {United States},
year = {2020},
month = {3}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Projections of Alzheimer's disease in the United States and the public health impact of delaying disease onset.
journal, September 1998

  • Brookmeyer, R.; Gray, S.; Kawas, C.
  • American Journal of Public Health, Vol. 88, Issue 9
  • DOI: 10.2105/AJPH.88.9.1337

Monetary Costs of Dementia in the United States
journal, April 2013

  • Hurd, Michael D.; Martorell, Paco; Delavande, Adeline
  • New England Journal of Medicine, Vol. 368, Issue 14
  • DOI: 10.1056/NEJMsa1204629

The Value of Delaying Alzheimer’s Disease Onset
journal, January 2015

  • Zissimopoulos, Julie; Crimmins, Eileen; St. Clair, Patricia
  • Forum for Health Economics and Policy, Vol. 18, Issue 1
  • DOI: 10.1515/fhep-2014-0013

Big data analytics in healthcare: promise and potential
journal, February 2014

  • Raghupathi, Wullianallur; Raghupathi, Viju
  • Health Information Science and Systems, Vol. 2, Issue 1
  • DOI: 10.1186/2047-2501-2-3

Risk score for the prediction of dementia risk in 20 years among middle aged people: a longitudinal, population-based study
journal, September 2006


Dementia risk prediction in the population: are screening models accurate?
journal, May 2010

  • Stephan, Blossom C. M.; Kurth, Tobias; Matthews, Fiona E.
  • Nature Reviews Neurology, Vol. 6, Issue 6
  • DOI: 10.1038/nrneurol.2010.54

Multiple cognitive deficits during the transition to Alzheimer's disease
journal, September 2004


Cognitive Deficits 3 to 6 Years Before Dementia Onset in a Population Sample: The Honolulu-Asia Aging Study: COGNITIVE DEFICITS BEFORE DEMENTIA ONSET
journal, March 2005


Prediction models to identify individuals at risk of metabolic syndrome who are unlikely to participate in a health intervention program
journal, March 2018


Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea
journal, April 2018


Evaluation of Machine-Learning Algorithms for Predicting Opioid Overdose Risk Among Medicare Beneficiaries With Opioid Prescriptions
journal, March 2019


Predicting drug-resistant epilepsy — A machine learning approach based on administrative claims data
journal, December 2018


Predicting the Future — Big Data, Machine Learning, and Clinical Medicine
journal, September 2016

  • Obermeyer, Ziad; Emanuel, Ezekiel J.
  • New England Journal of Medicine, Vol. 375, Issue 13
  • DOI: 10.1056/NEJMp1606181

On the Prospects for a (Deep) Learning Health Care System
journal, September 2018


Deep Learning—A Technology With the Potential to Transform Health Care
journal, September 2018


Current Developments in Dementia Risk Prediction Modelling: An Updated Systematic Review
journal, September 2015


Scalable and accurate deep learning with electronic health records
journal, May 2018


Anaemia increases the risk of dementia in cognitively intact elderly
journal, February 2006


Hemoglobin level in older persons and incident Alzheimer disease: Prospective cohort analysis
journal, July 2011


Anemia and risk of dementia in older adults: Findings from the Health ABC study
journal, July 2013


Anemia is associated with incidence of dementia: a national health screening study in Korea involving 37,900 persons
journal, December 2017


Comparison of Proteinuria Determination by Urine Dipstick, Spot Urine Protein Creatinine Index, and Urine Protein 24 Hours in Lupus Patients
journal, January 2011

  • Chotayaporn, Thanyaluk; Kasitanon, Nuntana; Sukitawut, Waraporn
  • Journal of Clinical Rheumatology, Vol. 17, Issue 3
  • DOI: 10.1097/RHU.0b013e318214bd18

Diagnostic Accuracy of Urine Dipsticks for Detection of Albuminuria in the General Community
journal, July 2011

  • White, Sarah L.; Yu, Richard; Craig, Jonathan C.
  • American Journal of Kidney Diseases, Vol. 58, Issue 1
  • DOI: 10.1053/j.ajkd.2010.12.026

Dementia risk in renal dysfunction: A systematic review and meta-analysis of prospective studies
journal, December 2016


Short-term treatment with tolfenamic acid improves cognitive functions in Alzheimer's disease mice
journal, October 2013


Tolfenamic acid reduces tau and CDK5 levels: implications for dementia and tauopathies
journal, October 2014

  • Adwan, Lina; Subaiea, Gehad M.; Basha, Riyaz
  • Journal of Neurochemistry, Vol. 133, Issue 2
  • DOI: 10.1111/jnc.12960

Tolfenamic acid downregulates BACE1 and protects against lead-induced upregulation of Alzheimer's disease related biomarkers
journal, April 2014


Tolfenamic Acid: A Modifier of the Tau Protein and its Role in Cognition and Tauopathy
journal, May 2018


Psychotropic Medication Burden and Factors Associated with Antipsychotic Use: An Analysis of a Population-Based Sample of Community-Dwelling Older Persons with Dementia
journal, September 2011


Schizophrenia and risk of dementia: a meta-analysis study
journal, January 2018

  • Cai, Laisheng; Huang, Jingwei
  • Neuropsychiatric Disease and Treatment, Vol. Volume 14
  • DOI: 10.2147/NDT.S172933

The treatment of cognitive dysfunction in dementia: a multiple treatments meta-analysis
journal, March 2018


Role of Vasodilation in Cognitive Impairment
journal, May 2011


A Nationwide Survey on the Prevalence of Dementia and Mild Cognitive Impairment in South Korea
journal, February 2011

  • Kim, Ki Woong; Park, Joon Hyuk; Kim, Myoung-Hee
  • Journal of Alzheimer's Disease, Vol. 23, Issue 2
  • DOI: 10.3233/JAD-2010-101221