skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics

Abstract

Cancer health disparities due to demographic and socioeconomic factors are an area of great interest in the epidemiological community. Adjusting for such factors is important when developing cancer risk models. However, for digital epidemiology studies relying on online sources such information is not readily available. This paper presents a novel method for extracting demographic and socioeconomic information from openly available online obituaries. The method relies on tailored language processing rules and a probabilistic scheme to map subjects’ occupation history to the occupation classification codes and related earnings provided by the U.S. Census Bureau. Using this information, a case-control study is executed fully in silico to investigate how age, gender, parity, and income level impact breast and lung cancer risk. Based on 48,368 online obituaries (4,643 for breast cancer, 6,274 for lung cancer, and 37,451 cancer-free) collected automatically and a generalized cancer risk model, our study shows strong association between age, parity, and socioeconomic status and cancer risk. Although for breast cancer the observed trends are very consistent with traditional epidemiological studies, some inconsistency is observed for lung cancer with respect to socioeconomic status.

Authors:
 [1];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biomedical Sciences, Engineering, and Computing Group. Health Data Sciences Inst.
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Inst. of Health (NIH) (United States)
OSTI Identifier:
1424481
Grant/Contract Number:  
AC05-00OR22725; 1R01-CA170508-04
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Human Performance in Extreme Environments
Additional Journal Information:
Journal Volume: 14; Journal Issue: 1; Journal ID: ISSN 2327-2937
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; digital epidemiology; natural language processing; case-control study; generalized linear model; obituary; cancer mortality; breast cancer; lung cancer

Citation Formats

Yoon, Hong-Jun, and Tourassi, Georgia. Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics. United States: N. p., 2018. Web. doi:10.7771/2327-2937.1087.
Yoon, Hong-Jun, & Tourassi, Georgia. Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics. United States. doi:10.7771/2327-2937.1087.
Yoon, Hong-Jun, and Tourassi, Georgia. Wed . "Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics". United States. doi:10.7771/2327-2937.1087. https://www.osti.gov/servlets/purl/1424481.
@article{osti_1424481,
title = {Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics},
author = {Yoon, Hong-Jun and Tourassi, Georgia},
abstractNote = {Cancer health disparities due to demographic and socioeconomic factors are an area of great interest in the epidemiological community. Adjusting for such factors is important when developing cancer risk models. However, for digital epidemiology studies relying on online sources such information is not readily available. This paper presents a novel method for extracting demographic and socioeconomic information from openly available online obituaries. The method relies on tailored language processing rules and a probabilistic scheme to map subjects’ occupation history to the occupation classification codes and related earnings provided by the U.S. Census Bureau. Using this information, a case-control study is executed fully in silico to investigate how age, gender, parity, and income level impact breast and lung cancer risk. Based on 48,368 online obituaries (4,643 for breast cancer, 6,274 for lung cancer, and 37,451 cancer-free) collected automatically and a generalized cancer risk model, our study shows strong association between age, parity, and socioeconomic status and cancer risk. Although for breast cancer the observed trends are very consistent with traditional epidemiological studies, some inconsistency is observed for lung cancer with respect to socioeconomic status.},
doi = {10.7771/2327-2937.1087},
journal = {Journal of Human Performance in Extreme Environments},
number = 1,
volume = 14,
place = {United States},
year = {Wed Jan 24 00:00:00 EST 2018},
month = {Wed Jan 24 00:00:00 EST 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: