Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics
Abstract
Cancer health disparities due to demographic and socioeconomic factors are an area of great interest in the epidemiological community. Adjusting for such factors is important when developing cancer risk models. However, for digital epidemiology studies relying on online sources such information is not readily available. This paper presents a novel method for extracting demographic and socioeconomic information from openly available online obituaries. The method relies on tailored language processing rules and a probabilistic scheme to map subjects’ occupation history to the occupation classification codes and related earnings provided by the U.S. Census Bureau. Using this information, a case-control study is executed fully in silico to investigate how age, gender, parity, and income level impact breast and lung cancer risk. Based on 48,368 online obituaries (4,643 for breast cancer, 6,274 for lung cancer, and 37,451 cancer-free) collected automatically and a generalized cancer risk model, our study shows strong association between age, parity, and socioeconomic status and cancer risk. Although for breast cancer the observed trends are very consistent with traditional epidemiological studies, some inconsistency is observed for lung cancer with respect to socioeconomic status.
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biomedical Sciences, Engineering, and Computing Group. Health Data Sciences Inst.
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Org.:
- USDOE Office of Science (SC); National Inst. of Health (NIH) (United States)
- OSTI Identifier:
- 1424481
- Grant/Contract Number:
- AC05-00OR22725; 1R01-CA170508-04
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Human Performance in Extreme Environments
- Additional Journal Information:
- Journal Volume: 14; Journal Issue: 1; Journal ID: ISSN 2327-2937
- Publisher:
- Purdue University
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 60 APPLIED LIFE SCIENCES; digital epidemiology; natural language processing; case-control study; generalized linear model; obituary; cancer mortality; breast cancer; lung cancer
Citation Formats
Yoon, Hong-Jun, and Tourassi, Georgia. Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics. United States: N. p., 2018.
Web. doi:10.7771/2327-2937.1087.
Yoon, Hong-Jun, & Tourassi, Georgia. Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics. United States. https://doi.org/10.7771/2327-2937.1087
Yoon, Hong-Jun, and Tourassi, Georgia. Wed .
"Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics". United States. https://doi.org/10.7771/2327-2937.1087. https://www.osti.gov/servlets/purl/1424481.
@article{osti_1424481,
title = {Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics},
author = {Yoon, Hong-Jun and Tourassi, Georgia},
abstractNote = {Cancer health disparities due to demographic and socioeconomic factors are an area of great interest in the epidemiological community. Adjusting for such factors is important when developing cancer risk models. However, for digital epidemiology studies relying on online sources such information is not readily available. This paper presents a novel method for extracting demographic and socioeconomic information from openly available online obituaries. The method relies on tailored language processing rules and a probabilistic scheme to map subjects’ occupation history to the occupation classification codes and related earnings provided by the U.S. Census Bureau. Using this information, a case-control study is executed fully in silico to investigate how age, gender, parity, and income level impact breast and lung cancer risk. Based on 48,368 online obituaries (4,643 for breast cancer, 6,274 for lung cancer, and 37,451 cancer-free) collected automatically and a generalized cancer risk model, our study shows strong association between age, parity, and socioeconomic status and cancer risk. Although for breast cancer the observed trends are very consistent with traditional epidemiological studies, some inconsistency is observed for lung cancer with respect to socioeconomic status.},
doi = {10.7771/2327-2937.1087},
journal = {Journal of Human Performance in Extreme Environments},
number = 1,
volume = 14,
place = {United States},
year = {2018},
month = {1}
}