DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data

Abstract

Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similarmore » when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.« less

Authors:
 [1]; ORCiD logo [2];  [3];  [4];  [4];  [3];  [1]; ORCiD logo [3]; ORCiD logo [5]
  1. Brookhaven National Lab. (BNL), Upton, NY (United States)
  2. Yonsei Univ. College of Medicine, Seoul (Korea)
  3. National Health Insurance Service Ilsan Hospital, Goyang (Korea)
  4. Columbia Univ., New York, NY (United States)
  5. Columbia Univ., New York, NY (United States); Seoul National Univ. (Korea)
Publication Date:
Research Org.:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21); Seoul National University; NHS Ilsan Hospital Research Support Program; National Institutes of Health (NIH); Brain Behavior Research Foundation Young Investigator Award; Korean Scientists and Engineers Association Young Investigator Grant; National Research Foundation of Korea (NRF); Ministry of Science
OSTI Identifier:
1618403
Report Number(s):
BNL-215918-2020-JAAM
Journal ID: ISSN 2398-6352
Grant/Contract Number:  
SC0012704; K01-MH109836
Resource Type:
Accepted Manuscript
Journal Name:
npj Digital Medicine
Additional Journal Information:
Journal Volume: 3; Journal Issue: 1; Journal ID: ISSN 2398-6352
Publisher:
Springer Nature
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Alzheimer's disease; predictive markers

Citation Formats

Park, Ji Hwan, Cho, Han Eol, Kim, Jong Hun, Wall, Melanie M., Stern, Yaakov, Lim, Hyunsun, Yoo, Shinjae, Kim, Hyoung Seop, and Cha, Jiook. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. United States: N. p., 2020. Web. doi:10.1038/s41746-020-0256-0.
Park, Ji Hwan, Cho, Han Eol, Kim, Jong Hun, Wall, Melanie M., Stern, Yaakov, Lim, Hyunsun, Yoo, Shinjae, Kim, Hyoung Seop, & Cha, Jiook. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. United States. https://doi.org/10.1038/s41746-020-0256-0
Park, Ji Hwan, Cho, Han Eol, Kim, Jong Hun, Wall, Melanie M., Stern, Yaakov, Lim, Hyunsun, Yoo, Shinjae, Kim, Hyoung Seop, and Cha, Jiook. Thu . "Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data". United States. https://doi.org/10.1038/s41746-020-0256-0. https://www.osti.gov/servlets/purl/1618403.
@article{osti_1618403,
title = {Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data},
author = {Park, Ji Hwan and Cho, Han Eol and Kim, Jong Hun and Wall, Melanie M. and Stern, Yaakov and Lim, Hyunsun and Yoo, Shinjae and Kim, Hyoung Seop and Cha, Jiook},
abstractNote = {Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.},
doi = {10.1038/s41746-020-0256-0},
journal = {npj Digital Medicine},
number = 1,
volume = 3,
place = {United States},
year = {Thu Mar 26 00:00:00 EDT 2020},
month = {Thu Mar 26 00:00:00 EDT 2020}
}

Works referenced in this record:

Monetary Costs of Dementia in the United States
journal, April 2013

  • Hurd, Michael D.; Martorell, Paco; Delavande, Adeline
  • New England Journal of Medicine, Vol. 368, Issue 14
  • DOI: 10.1056/NEJMsa1204629

The Value of Delaying Alzheimer’s Disease Onset
journal, January 2015

  • Zissimopoulos, Julie; Crimmins, Eileen; St. Clair, Patricia
  • Forum for Health Economics and Policy, Vol. 18, Issue 1
  • DOI: 10.1515/fhep-2014-0013

Big data analytics in healthcare: promise and potential
journal, February 2014

  • Raghupathi, Wullianallur; Raghupathi, Viju
  • Health Information Science and Systems, Vol. 2, Issue 1
  • DOI: 10.1186/2047-2501-2-3

Dementia risk prediction in the population: are screening models accurate?
journal, May 2010

  • Stephan, Blossom C. M.; Kurth, Tobias; Matthews, Fiona E.
  • Nature Reviews Neurology, Vol. 6, Issue 6
  • DOI: 10.1038/nrneurol.2010.54

Multiple cognitive deficits during the transition to Alzheimer's disease
journal, September 2004


Cognitive Deficits 3 to 6 Years Before Dementia Onset in a Population Sample: The Honolulu-Asia Aging Study: COGNITIVE DEFICITS BEFORE DEMENTIA ONSET
journal, March 2005


Prediction models to identify individuals at risk of metabolic syndrome who are unlikely to participate in a health intervention program
journal, March 2018


Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea
journal, April 2018


Evaluation of Machine-Learning Algorithms for Predicting Opioid Overdose Risk Among Medicare Beneficiaries With Opioid Prescriptions
journal, March 2019


Predicting drug-resistant epilepsy — A machine learning approach based on administrative claims data
journal, December 2018


Predicting the Future — Big Data, Machine Learning, and Clinical Medicine
journal, September 2016

  • Obermeyer, Ziad; Emanuel, Ezekiel J.
  • New England Journal of Medicine, Vol. 375, Issue 13
  • DOI: 10.1056/NEJMp1606181

On the Prospects for a (Deep) Learning Health Care System
journal, September 2018


Deep Learning—A Technology With the Potential to Transform Health Care
journal, September 2018


Current Developments in Dementia Risk Prediction Modelling: An Updated Systematic Review
journal, September 2015


Scalable and accurate deep learning with electronic health records
journal, May 2018


Anaemia increases the risk of dementia in cognitively intact elderly
journal, February 2006


Hemoglobin level in older persons and incident Alzheimer disease: Prospective cohort analysis
journal, July 2011


Anemia and risk of dementia in older adults: Findings from the Health ABC study
journal, July 2013


Anemia is associated with incidence of dementia: a national health screening study in Korea involving 37,900 persons
journal, December 2017


Diagnostic Accuracy of Urine Dipsticks for Detection of Albuminuria in the General Community
journal, July 2011

  • White, Sarah L.; Yu, Richard; Craig, Jonathan C.
  • American Journal of Kidney Diseases, Vol. 58, Issue 1
  • DOI: 10.1053/j.ajkd.2010.12.026

Short-term treatment with tolfenamic acid improves cognitive functions in Alzheimer's disease mice
journal, October 2013


Tolfenamic acid reduces tau and CDK5 levels: implications for dementia and tauopathies
journal, October 2014

  • Adwan, Lina; Subaiea, Gehad M.; Basha, Riyaz
  • Journal of Neurochemistry, Vol. 133, Issue 2
  • DOI: 10.1111/jnc.12960

Tolfenamic acid downregulates BACE1 and protects against lead-induced upregulation of Alzheimer's disease related biomarkers
journal, April 2014


Tolfenamic Acid: A Modifier of the Tau Protein and its Role in Cognition and Tauopathy
journal, May 2018


Psychotropic Medication Burden and Factors Associated with Antipsychotic Use: An Analysis of a Population-Based Sample of Community-Dwelling Older Persons with Dementia
journal, September 2011


Schizophrenia and risk of dementia: a meta-analysis study
journal, January 2018

  • Cai, Laisheng; Huang, Jingwei
  • Neuropsychiatric Disease and Treatment, Vol. Volume 14
  • DOI: 10.2147/NDT.S172933

The treatment of cognitive dysfunction in dementia: a multiple treatments meta-analysis
journal, March 2018


Role of Vasodilation in Cognitive Impairment
journal, May 2011


A Nationwide Survey on the Prevalence of Dementia and Mild Cognitive Impairment in South Korea
journal, February 2011

  • Kim, Ki Woong; Park, Joon Hyuk; Kim, Myoung-Hee
  • Journal of Alzheimer's Disease, Vol. 23, Issue 2
  • DOI: 10.3233/JAD-2010-101221

Cohort Profile: The National Health Insurance Service–National Sample Cohort (NHIS-NSC), South Korea
journal, January 2016

  • Lee, Juneyoung; Lee, Ji Sung; Park, Sook-Hee
  • International Journal of Epidemiology
  • DOI: 10.1093/ije/dyv319

On the Prospects for a (Deep) Learning Health Care System
journal, September 2018


All Patient Refined-Diagnosis Related Groups’ (APR-DRGs) Severity of Illness and Risk of Mortality as predictors of in-hospital mortality
journal, May 2022


Prediction models to identify individuals at risk of metabolic syndrome who are unlikely to participate in a health intervention program
journal, March 2018


Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea
journal, April 2018


Anaemia increases the risk of dementia in cognitively intact elderly
journal, February 2006


Short-term treatment with tolfenamic acid improves cognitive functions in Alzheimer's disease mice
journal, October 2013


Tolfenamic acid downregulates BACE1 and protects against lead-induced upregulation of Alzheimer's disease related biomarkers
journal, April 2014


Predicting drug-resistant epilepsy — A machine learning approach based on administrative claims data
journal, December 2018


Risk score for the prediction of dementia risk in 20 years among middle aged people: a longitudinal, population-based study
journal, September 2006


Diagnostic Accuracy of Urine Dipsticks for Detection of Albuminuria in the General Community
journal, July 2011

  • White, Sarah L.; Yu, Richard; Craig, Jonathan C.
  • American Journal of Kidney Diseases, Vol. 58, Issue 1
  • DOI: 10.1053/j.ajkd.2010.12.026

Comparison of Proteinuria Determination by Urine Dipstick, Spot Urine Protein Creatinine Index, and Urine Protein 24 Hours in Lupus Patients
journal, January 2011

  • Chotayaporn, Thanyaluk; Kasitanon, Nuntana; Sukitawut, Waraporn
  • Journal of Clinical Rheumatology, Vol. 17, Issue 3
  • DOI: 10.1097/rhu.0b013e318214bd18

Multiple cognitive deficits during the transition to Alzheimer's disease
journal, September 2004


Cognitive Deficits 3 to 6 Years Before Dementia Onset in a Population Sample: The Honolulu-Asia Aging Study: COGNITIVE DEFICITS BEFORE DEMENTIA ONSET
journal, March 2005


Psychotropic Medication Burden and Factors Associated with Antipsychotic Use: An Analysis of a Population-Based Sample of Community-Dwelling Older Persons with Dementia
journal, September 2011


Role of Vasodilation in Cognitive Impairment
journal, May 2011


Tolfenamic acid reduces tau and CDK5 levels: implications for dementia and tauopathies
journal, October 2014

  • Adwan, Lina; Subaiea, Gehad M.; Basha, Riyaz
  • Journal of Neurochemistry, Vol. 133, Issue 2
  • DOI: 10.1111/jnc.12960

Anemia is associated with incidence of dementia: a national health screening study in Korea involving 37,900 persons
journal, December 2017


Dementia risk in renal dysfunction: A systematic review and meta-analysis of prospective studies
journal, December 2016


Current Developments in Dementia Risk Prediction Modelling: An Updated Systematic Review
journal, September 2015


The Value of Delaying Alzheimer’s Disease Onset
journal, January 2015

  • Zissimopoulos, Julie; Crimmins, Eileen; St. Clair, Patricia
  • Forum for Health Economics and Policy, Vol. 18, Issue 1
  • DOI: 10.1515/fhep-2014-0013

Projections of Alzheimer's disease in the United States and the public health impact of delaying disease onset.
journal, September 1998

  • Brookmeyer, R.; Gray, S.; Kawas, C.
  • American Journal of Public Health, Vol. 88, Issue 9
  • DOI: 10.2105/ajph.88.9.1337

Tolfenamic Acid: A Modifier of the Tau Protein and its Role in Cognition and Tauopathy
journal, May 2018


Works referencing / citing this record:

Period, birth cohort and prevalence of dementia in mainland China, Hong Kong and Taiwan: a meta‐analysis
journal, May 2014

  • Wu, Yu‐Tzu; Lee, Hsin‐yi; Norton, Samuel
  • International Journal of Geriatric Psychiatry, Vol. 29, Issue 12
  • DOI: 10.1002/gps.4148

Prevalence of dementia in East Asia: a synthetic review of time trends
journal, May 2015

  • Wu, Yu‐Tzu; Brayne, Carol; Matthews, Fiona E.
  • International Journal of Geriatric Psychiatry, Vol. 30, Issue 8
  • DOI: 10.1002/gps.4297

Clinical Implications of Quantitative Electroencephalography and Current Source Density in Patients with Alzheimer’s Disease
journal, June 2012


Cognitive Function and Quality of Life in Community-Dwelling Seniors with Mild Cognitive Impairment in Taiwan
journal, March 2016

  • Hsiao, Hua-Tsen; Li, Shu-Ying; Yang, Ya-Ping
  • Community Mental Health Journal, Vol. 52, Issue 4
  • DOI: 10.1007/s10597-016-9993-6

Approaches in methodology for population-based longitudinal study on neuroprotective model for healthy longevity (TUA) among Malaysian Older Adults
journal, December 2015

  • Shahar, Suzana; Omar, Azahadi; Vanoh, Divya
  • Aging Clinical and Experimental Research, Vol. 28, Issue 6
  • DOI: 10.1007/s40520-015-0511-4

The prevalence of mild cognitive impairment and its etiological subtypes in elderly Chinese
journal, January 2014


Current and past leisure time physical activity in relation to risk of Alzheimer's disease in older adults
journal, October 2019


The prevalence and incidence of dementia with Lewy bodies: a systematic review of population and clinical studies
journal, March 2013


Impact of illiteracy on depression symptomatology in community-dwelling older adults
journal, June 2014

  • Kim, Byung-Soo; Lee, Dong-Woo; Bae, Jae Nam
  • International Psychogeriatrics, Vol. 26, Issue 10
  • DOI: 10.1017/s1041610214001094

Association between lifestyle and cognitive impairment among women aged 65 years and over in the Republic of Korea
journal, September 2015


One-year mortality among newly admitted older patients in a long-term care hospital in South Korea
journal, July 2018

  • Kim, Mi Sook; Shin, Dong-Soo; Kim, SookNyeo
  • Australasian Journal on Ageing, Vol. 37, Issue 3
  • DOI: 10.1111/ajag.12567

Mild cognitive impairment: a concept in evolution
journal, March 2014

  • Petersen, R. C.; Caracciolo, B.; Brayne, C.
  • Journal of Internal Medicine, Vol. 275, Issue 3
  • DOI: 10.1111/joim.12190

Alzheimer's disease with cerebrovascular disease: current status in the Asia-Pacific region
journal, March 2016

  • Chen, C.; Homma, A.; Mok, V. C. T.
  • Journal of Internal Medicine, Vol. 280, Issue 4
  • DOI: 10.1111/joim.12495

Rehabilitation of lost teeth related to maintenance of cognitive function
journal, February 2018

  • Shin, Myung‐Seop; Shin, Yoo Jin; Karna, Sandeep
  • Oral Diseases, Vol. 25, Issue 1
  • DOI: 10.1111/odi.12960

Productive Activities and Risk of Cognitive Impairment and Depression: Does the Association Vary by Gender?
journal, December 2019


Burden of disease due to dementia in the elderly population of Korea: present and future
journal, April 2013


Incidence and predictors of mild cognitive impairment (MCI) within a multi-ethnic Asian populace: a community-based longitudinal study
journal, August 2019

  • Hussin, Norlela Mohd; Shahar, Suzana; Yahya, Hanis Mastura
  • BMC Public Health, Vol. 19, Issue 1
  • DOI: 10.1186/s12889-019-7508-4

The changing prevalence and incidence of dementia over time — current evidence
text, January 2017

  • Wu, Y-T; Beiser, As; Breteler, Mmb
  • Apollo - University of Cambridge Repository
  • DOI: 10.17863/cam.12022

Cancer Prevention Using Machine Learning, Nudge Theory and Social Impact Bond
journal, January 2020

  • Misawa, Daitaro; Fukuyoshi, Jun; Sengoku, Shintaro
  • International Journal of Environmental Research and Public Health, Vol. 17, Issue 3
  • DOI: 10.3390/ijerph17030790

Traditional Korean East Asian Medicines and Herbal Formulations for Cognitive Impairment
journal, November 2013


Cognitive Stimulation as a Therapeutic Modality for Dementia: A Meta-Analysis
journal, January 2017


Cancer Prevention Using Machine Learning, Nudge Theory and Social Impact Bond
journal, January 2020

  • Misawa, Daitaro; Fukuyoshi, Jun; Sengoku, Shintaro
  • International Journal of Environmental Research and Public Health, Vol. 17, Issue 3
  • DOI: 10.3390/ijerph17030790