skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The utility of web mining for epidemiological research: studying the association between parity and cancer risk [Web Mining for Epidemiological Research. Assessing its Utility in Exploring the Association Between Parity and Cancer Risk]

Journal Article · · Journal of the American Medical Informatics Association
DOI:https://doi.org/10.1093/jamia/ocv141· OSTI ID:1236580
 [1];  [1];  [2];  [3]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. New Jersey Inst. of Technology, Newark, NJ (United States)
  3. American Cancer Society, Atlanta, GA (United States)

Background: The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored. Methods: Using advanced web crawling and tailored information extraction procedures we automatically collected and analyzed the text content of 79,394 online obituary articles published between 1998 and 2014. The collected data included 51,911 cancer (27,330 breast; 9,470 lung; 6,496 pancreatic; 6,342 ovarian; 2,273 colon) and 27,483 non-cancer cases. With the derived information, we replicated a case-control study design to investigate the association between parity and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies. Results: Parity was found to be associated with a significantly reduced risk of breast cancer (OR=0.78, 95% CI = 0.75 to 0.82), pancreatic cancer (OR=0.78, 95% CI = 0.72 to 0.83), colon cancer (OR=0.67, 95% CI = 0.60 to 0.74), and ovarian cancer (OR=0.58, 95% CI = 0.54 to 0.62). Marginal association was found for lung cancer prevalence (OR=0.87, 95% CI = 0.81 to 0.92). The linear trend between multi-parity and reduced cancer risk was dramatically more pronounced for breast and ovarian cancer than the other cancers included in the analysis. Conclusion: This large web-mining study on parity and cancer risk produced findings very similar to those reported with traditional observational studies. It may be used as a promising strategy to generate study hypotheses for guiding and prioritizing future epidemiological studies.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1236580
Journal Information:
Journal of the American Medical Informatics Association, Vol. 23, Issue 3; ISSN 1067-5027
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 3 works
Citation information provided by
Web of Science

References (50)

Digital Social Networks and Health journal April 2013
Infodemiology and Infoveillance journal May 2011
Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation journal January 2013
The Internet and the Global Monitoring of Emerging Diseases: Lessons from the First 10 Years of ProMED-mail journal November 2005
Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza journal May 2010
Medicine 2.0: Social Networking, Collaboration, Participation, Apomediation, and Openness journal January 2008
Using the Internet to Promote Health Behavior Change: A Systematic Review and Meta-analysis of the Impact of Theoretical Basis, Use of Behavior Change Techniques, and Mode of Delivery on Efficacy journal January 2010
Accessing Suicide-Related Information on the Internet: A Retrospective Observational Study of Search Behavior journal January 2012
Web search behavior for multiple sclerosis: An infodemiological study journal July 2014
Health-Related Hot Topic Detection in Online Communities Using Text Clustering journal February 2013
Online Interventions for Social Marketing Health Behavior Change Campaigns: A Meta-Analysis of Psychological Architectures and Adherence Factors journal January 2011
A Novel Evaluation of World No Tobacco Day in Latin America journal January 2012
Using Search Query Surveillance to Monitor Tax Avoidance and Smoking Cessation following the United States' 2009 “SCHIP” Cigarette Tax Increase journal March 2011
Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm journal April 2011
Patient-reported Outcomes as a Source of Evidence in Off-Label Prescribing: Analysis of Data From PatientsLikeMe journal January 2011
Understanding Topics and Sentiment in an Online Cancer Survivor Community journal December 2013
The process and effect of supportive message expression and reception in online breast cancer support groups journal March 2011
Parity and breast cancer risk: Possible effect on age at diagnosis journal January 1986
The independent associations of parity, age at first full term pregnancy, and duration of breastfeeding with the risk of breast cancer journal January 1989
Reproductive Factors and Breast Cancer journal January 1993
Parity, age at first and last birth, and risk of breast cancer: A population-based study in Sweden journal October 1996
Mammographic density, parity and age at first birth, and risk of breast cancer: an analysis of four case–control studies journal January 2012
Reproductive and Hormonal Factors in Association With Ovarian Cancer in the Netherlands Cohort Study journal September 2010
Hormonal Risk Factors for Ovarian Cancer in Premenopausal and Postmenopausal Women journal February 2008
Reproductive Factors and Epithelial Ovarian Cancer Risk by Histologic Type:A Multiethnic Case-Control Study journal October 2003
Oral contraceptive use and reproductive factors and risk of ovarian cancer in the European Prospective Investigation into Cancer and Nutrition journal September 2011
Characteristics Relating to Ovarian Cancer Risk: Collaborative Analysis of 12 US Case -Control Studies journal November 1992
Reproductive factors in relation to ovarian cancer: a case–control study in Northern Vietnam journal November 2012
Reproductive factors for ovarian cancer in southern Chinese women journal January 2013
Menstrual and reproductive factors in relation to ovarian cancer risk journal January 2001
Association of Parity and Ovarian Cancer Risk by Family History of Breast or Ovarian Cancer in a Population-Based Study of Postmenopausal Women journal January 2002
Ovarian Cancer Risk Factors in African-American and White Women journal July 2009
Reproductive Factors and Risk of Pancreatic Cancer in Women: A Review of the Literature journal February 2009
Parity and Pancreatic Cancer Risk: A Dose-Response Meta-Analysis of Epidemiologic Studies journal March 2014
Parity and risk of lung cancer in women: Systematic review and meta-analysis of epidemiological studies journal May 2012
Reproductive factors and colon cancers journal May 1990
Reproductive History and Risk of Colorectal Cancer in Postmenopausal Women journal March 2011
The Relationship between Gravidity and Parity and Colorectal Cancer Risk journal July 2009
Oral contraceptives, reproductive history and risk of colorectal cancer in the European Prospective Investigation into Cancer and Nutrition journal November 2010
Reproductive Factors, Oral Contraceptive Use, and Risk of Colorectal Cancer journal January 1997
Parity and Risk of Colorectal Cancer: A Dose-Response Meta-Analysis of Prospective Studies journal September 2013
A user-oriented web crawler for selectively acquiring online content in e-health research journal September 2013
The Stanford CoreNLP Natural Language Processing Toolkit
  • Manning, Christopher; Surdeanu, Mihai; Bauer, John
  • Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations https://doi.org/10.3115/v1/P14-5010
conference January 2014
Cancer statistics, 2015: Cancer Statistics, 2015 journal January 2015
Big Data and Large Sample Size: A Cautionary Note on the Potential for Bias: B journal July 2014
News from the NIH: leveraging big data in the behavioral sciences journal June 2014
Sugar, meat, and fat intake, and non-dietary risk factors for colon cancer incidence in Iowa women (United States) journal January 1994
Lifestyle, Occupational, and Reproductive Factors and Risk of Colorectal Cancer journal January 2010
Childbearing, oral contraceptive use, and breast cancer journal April 1993
Social Media and Clinical Care: Ethical, Professional, and Social Implications journal April 2013

Cited By (1)

Digital Epidemiology: Use of Digital Data Collected for Non-epidemiological Purposes in Epidemiological Studies journal January 2018

Similar Records

Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics
Journal Article · Wed Jan 24 00:00:00 EST 2018 · Journal of Human Performance in Extreme Environments · OSTI ID:1236580

A novel web informatics approach for automated surveillance of cancer mortality trends
Journal Article · Wed Jun 01 00:00:00 EDT 2016 · Journal of Biomedical Informatics · OSTI ID:1236580

Risk of leukemia associated with the first course of cancer treatment: an analysis of the Surveillance, Epidemiology, and End Results Program experience
Journal Article · Thu Mar 01 00:00:00 EST 1984 · J. Natl. Cancer Inst.; (United States) · OSTI ID:1236580