The utility of web mining for epidemiological research: studying the association between parity and cancer risk [Web Mining for Epidemiological Research. Assessing its Utility in Exploring the Association Between Parity and Cancer Risk]
Abstract
Background: The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored. Methods: Using advanced web crawling and tailored information extraction procedures we automatically collected and analyzed the text content of 79,394 online obituary articles published between 1998 and 2014. The collected data included 51,911 cancer (27,330 breast; 9,470 lung; 6,496 pancreatic; 6,342 ovarian; 2,273 colon) and 27,483 non-cancer cases. With the derived information, we replicated a case-control study design to investigate the association between parity and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies. Results: Parity was found to be associated with a significantly reduced risk of breast cancer (OR=0.78, 95% CI = 0.75 to 0.82), pancreatic cancer (OR=0.78, 95% CI = 0.72 to 0.83), colon cancer (OR=0.67, 95% CI = 0.60 to 0.74), and ovarian cancer (OR=0.58, 95% CI = 0.54 to 0.62). Marginal association was found for lung cancer prevalence (OR=0.87, 95% CI = 0.81 to 0.92). The linear trend between multi-parity and reduced cancer risk was dramatically more pronounced formore »
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- New Jersey Inst. of Technology, Newark, NJ (United States)
- American Cancer Society, Atlanta, GA (United States)
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1236580
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of the American Medical Informatics Association
- Additional Journal Information:
- Journal Volume: 23; Journal Issue: 3; Journal ID: ISSN 1067-5027
- Publisher:
- Oxford University Press
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; web mining; cancer; epidemiology
Citation Formats
Tourassi, Georgia, Yoon, Hong-Jun, Xu, Songhua, and Han, Xuesong. The utility of web mining for epidemiological research: studying the association between parity and cancer risk [Web Mining for Epidemiological Research. Assessing its Utility in Exploring the Association Between Parity and Cancer Risk]. United States: N. p., 2015.
Web. doi:10.1093/jamia/ocv141.
Tourassi, Georgia, Yoon, Hong-Jun, Xu, Songhua, & Han, Xuesong. The utility of web mining for epidemiological research: studying the association between parity and cancer risk [Web Mining for Epidemiological Research. Assessing its Utility in Exploring the Association Between Parity and Cancer Risk]. United States. https://doi.org/10.1093/jamia/ocv141
Tourassi, Georgia, Yoon, Hong-Jun, Xu, Songhua, and Han, Xuesong. Fri .
"The utility of web mining for epidemiological research: studying the association between parity and cancer risk [Web Mining for Epidemiological Research. Assessing its Utility in Exploring the Association Between Parity and Cancer Risk]". United States. https://doi.org/10.1093/jamia/ocv141. https://www.osti.gov/servlets/purl/1236580.
@article{osti_1236580,
title = {The utility of web mining for epidemiological research: studying the association between parity and cancer risk [Web Mining for Epidemiological Research. Assessing its Utility in Exploring the Association Between Parity and Cancer Risk]},
author = {Tourassi, Georgia and Yoon, Hong-Jun and Xu, Songhua and Han, Xuesong},
abstractNote = {Background: The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored. Methods: Using advanced web crawling and tailored information extraction procedures we automatically collected and analyzed the text content of 79,394 online obituary articles published between 1998 and 2014. The collected data included 51,911 cancer (27,330 breast; 9,470 lung; 6,496 pancreatic; 6,342 ovarian; 2,273 colon) and 27,483 non-cancer cases. With the derived information, we replicated a case-control study design to investigate the association between parity and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies. Results: Parity was found to be associated with a significantly reduced risk of breast cancer (OR=0.78, 95% CI = 0.75 to 0.82), pancreatic cancer (OR=0.78, 95% CI = 0.72 to 0.83), colon cancer (OR=0.67, 95% CI = 0.60 to 0.74), and ovarian cancer (OR=0.58, 95% CI = 0.54 to 0.62). Marginal association was found for lung cancer prevalence (OR=0.87, 95% CI = 0.81 to 0.92). The linear trend between multi-parity and reduced cancer risk was dramatically more pronounced for breast and ovarian cancer than the other cancers included in the analysis. Conclusion: This large web-mining study on parity and cancer risk produced findings very similar to those reported with traditional observational studies. It may be used as a promising strategy to generate study hypotheses for guiding and prioritizing future epidemiological studies.},
doi = {10.1093/jamia/ocv141},
journal = {Journal of the American Medical Informatics Association},
number = 3,
volume = 23,
place = {United States},
year = {Fri Nov 27 00:00:00 EST 2015},
month = {Fri Nov 27 00:00:00 EST 2015}
}
Web of Science
Works referenced in this record:
Digital Social Networks and Health
journal, April 2013
- Lefebvre, R. Craig; Bornkessel, Alexandra S.
- Circulation, Vol. 127, Issue 17
Infodemiology and Infoveillance
journal, May 2011
- Eysenbach, Gunther
- American Journal of Preventive Medicine, Vol. 40, Issue 5
Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation
journal, January 2013
- Bernardo, Theresa Marie; Rajic, Andrijana; Young, Ian
- Journal of Medical Internet Research, Vol. 15, Issue 7
The Internet and the Global Monitoring of Emerging Diseases: Lessons from the First 10 Years of ProMED-mail
journal, November 2005
- Madoff, Lawrence C.; Woodall, John P.
- Archives of Medical Research, Vol. 36, Issue 6
Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza
journal, May 2010
- Brownstein, John S.; Freifeld, Clark C.; Chan, Emily H.
- New England Journal of Medicine, Vol. 362, Issue 18
Medicine 2.0: Social Networking, Collaboration, Participation, Apomediation, and Openness
journal, January 2008
- Eysenbach, Gunther
- Journal of Medical Internet Research, Vol. 10, Issue 3
Using the Internet to Promote Health Behavior Change: A Systematic Review and Meta-analysis of the Impact of Theoretical Basis, Use of Behavior Change Techniques, and Mode of Delivery on Efficacy
journal, January 2010
- Webb, Thomas L.; Joseph, Judith; Yardley, Lucy
- Journal of Medical Internet Research, Vol. 12, Issue 1
Accessing Suicide-Related Information on the Internet: A Retrospective Observational Study of Search Behavior
journal, January 2012
- Wong, Paul Wai-Ching; Fu, King-Wa; Yau, Rickey Sai-Pong
- Journal of Medical Internet Research, Vol. 15, Issue 1
Web search behavior for multiple sclerosis: An infodemiological study
journal, July 2014
- Brigo, Francesco; Lochner, Piergiorgio; Tezzon, Frediano
- Multiple Sclerosis and Related Disorders, Vol. 3, Issue 4
Health-Related Hot Topic Detection in Online Communities Using Text Clustering
journal, February 2013
- Lu, Yingjie; Zhang, Pengzhu; Liu, Jingfang
- PLoS ONE, Vol. 8, Issue 2
Online Interventions for Social Marketing Health Behavior Change Campaigns: A Meta-Analysis of Psychological Architectures and Adherence Factors
journal, January 2011
- Cugelman, Brian; Thelwall, Mike; Dawes, Phil
- Journal of Medical Internet Research, Vol. 13, Issue 1
A Novel Evaluation of World No Tobacco Day in Latin America
journal, January 2012
- Ayers, John W.; Althouse, Benjamin M.; Allem, Jon-Patrick
- Journal of Medical Internet Research, Vol. 14, Issue 3
Using Search Query Surveillance to Monitor Tax Avoidance and Smoking Cessation following the United States' 2009 “SCHIP” Cigarette Tax Increase
journal, March 2011
- Ayers, John W.; Ribisl, Kurt; Brownstein, John S.
- PLoS ONE, Vol. 6, Issue 3
Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm
journal, April 2011
- Wicks, Paul; Vaughan, Timothy E.; Massagli, Michael P.
- Nature Biotechnology, Vol. 29, Issue 5
Patient-reported Outcomes as a Source of Evidence in Off-Label Prescribing: Analysis of Data From PatientsLikeMe
journal, January 2011
- Frost, Jeana; Okun, Sally; Vaughan, Timothy
- Journal of Medical Internet Research, Vol. 13, Issue 1
Understanding Topics and Sentiment in an Online Cancer Survivor Community
journal, December 2013
- Portier, K.; Greer, G. E.; Rokach, L.
- JNCI Monographs, Vol. 2013, Issue 47
The process and effect of supportive message expression and reception in online breast cancer support groups
journal, March 2011
- Kim, Eunkyung; Han, Jeong Yeob; Moon, Tae Joon
- Psycho-Oncology, Vol. 21, Issue 5
Parity and breast cancer risk: Possible effect on age at diagnosis
journal, January 1986
- Pathak, Dorothy R.; Speizer, Frank E.; Willett, Walter C.
- International Journal of Cancer, Vol. 37, Issue 1
The independent associations of parity, age at first full term pregnancy, and duration of breastfeeding with the risk of breast cancer
journal, January 1989
- Layde, Peter M.; Webster, Linda A.; Baughman, Andrew L.
- Journal of Clinical Epidemiology, Vol. 42, Issue 10
Reproductive Factors and Breast Cancer
journal, January 1993
- Kelsey, Jennifer L.; Gammon, Marilie D.; John, Esther M.
- Epidemiologic Reviews, Vol. 15, Issue 1
Parity, age at first and last birth, and risk of breast cancer: A population-based study in Sweden
journal, October 1996
- Lambe, Mats; Hsieh, Chung-cheng; Chan, Hsiao-wei
- Breast Cancer Research and Treatment, Vol. 38, Issue 3
Mammographic density, parity and age at first birth, and risk of breast cancer: an analysis of four case–control studies
journal, January 2012
- Woolcott, Christy G.; Koga, Karin; Conroy, Shannon M.
- Breast Cancer Research and Treatment, Vol. 132, Issue 3
Reproductive and Hormonal Factors in Association With Ovarian Cancer in the Netherlands Cohort Study
journal, September 2010
- Braem, M. G. M.; Onland-Moret, N. C.; van den Brandt, P. A.
- American Journal of Epidemiology, Vol. 172, Issue 10
Hormonal Risk Factors for Ovarian Cancer in Premenopausal and Postmenopausal Women
journal, February 2008
- Moorman, P. G.; Calingaert, B.; Palmieri, R. T.
- American Journal of Epidemiology, Vol. 167, Issue 9
Reproductive Factors and Epithelial Ovarian Cancer Risk by Histologic Type:A Multiethnic Case-Control Study
journal, October 2003
- Tung, K. -H.
- American Journal of Epidemiology, Vol. 158, Issue 7
Oral contraceptive use and reproductive factors and risk of ovarian cancer in the European Prospective Investigation into Cancer and Nutrition
journal, September 2011
- Tsilidis, K. K.; Allen, N. E.; Key, T. J.
- British Journal of Cancer, Vol. 105, Issue 9
Characteristics Relating to Ovarian Cancer Risk: Collaborative Analysis of 12 US Case -Control Studies
journal, November 1992
- Whittmore, Alice S.; Harris, Robin; Itnyre, Jacqueline
- American Journal of Epidemiology, Vol. 136, Issue 10
Reproductive factors in relation to ovarian cancer: a case–control study in Northern Vietnam
journal, November 2012
- Le, Duc-Cuong; Kubo, Tatsuhiko; Fujino, Yoshihisa
- Contraception, Vol. 86, Issue 5
Reproductive factors for ovarian cancer in southern Chinese women
journal, January 2013
- Pasalich, Maria; Su, Dada; Binns, Colin W.
- Journal of Gynecologic Oncology, Vol. 24, Issue 2
Menstrual and reproductive factors in relation to ovarian cancer risk
journal, January 2001
- Titus-Ernstoff, L.; Perez, K.; Cramer, D. W.
- British Journal of Cancer, Vol. 84, Issue 5
Association of Parity and Ovarian Cancer Risk by Family History of Breast or Ovarian Cancer in a Population-Based Study of Postmenopausal Women
journal, January 2002
- Vachon, Celine M.; Mink, Pamela J.; Janney, Carol A.
- Epidemiology, Vol. 13, Issue 1
Ovarian Cancer Risk Factors in African-American and White Women
journal, July 2009
- Moorman, P. G.; Palmieri, R. T.; Akushevich, L.
- American Journal of Epidemiology, Vol. 170, Issue 5
Reproductive Factors and Risk of Pancreatic Cancer in Women: A Review of the Literature
journal, February 2009
- Wahi, Monika M.; Shah, Nilay; Schrock, Christopher E.
- Annals of Epidemiology, Vol. 19, Issue 2
Parity and Pancreatic Cancer Risk: A Dose-Response Meta-Analysis of Epidemiologic Studies
journal, March 2014
- Guan, Hong-Bo; Wu, Lang; Wu, Qi-Jun
- PLoS ONE, Vol. 9, Issue 3
Parity and risk of lung cancer in women: Systematic review and meta-analysis of epidemiological studies
journal, May 2012
- Dahabreh, Issa J.; Trikalinos, Thomas A.; Paulus, Jessica K.
- Lung Cancer, Vol. 76, Issue 2
Reproductive factors and colon cancers
journal, May 1990
- Peters, Rk; Pike, Mc; Chang, Wwl
- British Journal of Cancer, Vol. 61, Issue 5
Reproductive History and Risk of Colorectal Cancer in Postmenopausal Women
journal, March 2011
- Zervoudakis, A.; Strickler, H. D.; Park, Y.
- JNCI Journal of the National Cancer Institute, Vol. 103, Issue 10
The Relationship between Gravidity and Parity and Colorectal Cancer Risk
journal, July 2009
- Wernli, Karen J.; Wang, Yinghui; Zheng, Yingye
- Journal of Women's Health, Vol. 18, Issue 7
Oral contraceptives, reproductive history and risk of colorectal cancer in the European Prospective Investigation into Cancer and Nutrition
journal, November 2010
- Tsilidis, K. K.; Allen, N. E.; Key, T. J.
- British Journal of Cancer, Vol. 103, Issue 11
Reproductive Factors, Oral Contraceptive Use, and Risk of Colorectal Cancer
journal, January 1997
- Troisi, Rebecca; Schairer, Catherine; Chow, Wong-Ho
- Epidemiology, Vol. 8, Issue 1
Parity and Risk of Colorectal Cancer: A Dose-Response Meta-Analysis of Prospective Studies
journal, September 2013
- Guan, Hong-Bo; Wu, Qi-Jun; Gong, Ting-Ting
- PLoS ONE, Vol. 8, Issue 9
A user-oriented web crawler for selectively acquiring online content in e-health research
journal, September 2013
- Xu, Songhua; Yoon, Hong-Jun; Tourassi, Georgia
- Bioinformatics, Vol. 30, Issue 1
The Stanford CoreNLP Natural Language Processing Toolkit
conference, January 2014
- Manning, Christopher; Surdeanu, Mihai; Bauer, John
- Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Cancer statistics, 2015: Cancer Statistics, 2015
journal, January 2015
- Siegel, Rebecca L.; Miller, Kimberly D.; Jemal, Ahmedin
- CA: A Cancer Journal for Clinicians, Vol. 65, Issue 1
Big Data and Large Sample Size: A Cautionary Note on the Potential for Bias: B
journal, July 2014
- Kaplan, Robert M.; Chambers, David A.; Glasgow, Russell E.
- Clinical and Translational Science, Vol. 7, Issue 4
News from the NIH: leveraging big data in the behavioral sciences
journal, June 2014
- Kaplan, Robert M.; Riley, William T.; Mabry, Patricia L.
- Translational Behavioral Medicine, Vol. 4, Issue 3
Parity and breast cancer risk: Possible effect on age at diagnosis
journal, January 1986
- Pathak, Dorothy R.; Speizer, Frank E.; Willett, Walter C.
- International Journal of Cancer, Vol. 37, Issue 1
Sugar, meat, and fat intake, and non-dietary risk factors for colon cancer incidence in Iowa women (United States)
journal, January 1994
- Bostick, Roberd M.; Potter, John D.; Kushi, Lawrence H.
- Cancer Causes & Control, Vol. 5, Issue 1
Lifestyle, Occupational, and Reproductive Factors and Risk of Colorectal Cancer
journal, January 2010
- Lo, An-Chi; Soliman, Amr S.; Khaled, Hussein M.
- Diseases of the Colon & Rectum, Vol. 53, Issue 5
Childbearing, oral contraceptive use, and breast cancer
journal, April 1993
- Beral, Valerie; Reeves, Gillian
- The Lancet, Vol. 341, Issue 8852
The Internet and the Global Monitoring of Emerging Diseases: Lessons from the First 10 Years of ProMED-mail
journal, November 2005
- Madoff, Lawrence C.; Woodall, John P.
- Archives of Medical Research, Vol. 36, Issue 6
Reproductive factors and colon cancers
journal, May 1990
- Peters, Rk; Pike, Mc; Chang, Wwl
- British Journal of Cancer, Vol. 61, Issue 5
Oral contraceptive use and reproductive factors and risk of ovarian cancer in the European Prospective Investigation into Cancer and Nutrition
journal, September 2011
- Tsilidis, K. K.; Allen, N. E.; Key, T. J.
- British Journal of Cancer, Vol. 105, Issue 9
Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm
journal, April 2011
- Wicks, Paul; Vaughan, Timothy E.; Massagli, Michael P.
- Nature Biotechnology, Vol. 29, Issue 5
Menstrual and reproductive factors in relation to ovarian cancer risk
journal, January 2001
- Titus-Ernstoff, L.; Perez, K.; Cramer, D. W.
- British Journal of Cancer, Vol. 84, Issue 5
Hormonal Risk Factors for Ovarian Cancer in Premenopausal and Postmenopausal Women
journal, February 2008
- Moorman, P. G.; Calingaert, B.; Palmieri, R. T.
- American Journal of Epidemiology, Vol. 167, Issue 9
Ovarian Cancer Risk Factors in African-American and White Women
journal, July 2009
- Moorman, P. G.; Palmieri, R. T.; Akushevich, L.
- American Journal of Epidemiology, Vol. 170, Issue 5
Characteristics Relating to Ovarian Cancer Risk: Collaborative Analysis of 12 US Case -Control Studies
journal, November 1992
- Whittmore, Alice S.; Harris, Robin; Itnyre, Jacqueline
- American Journal of Epidemiology, Vol. 136, Issue 10
Social Media and Clinical Care: Ethical, Professional, and Social Implications
journal, April 2013
- Chretien, Katherine C.; Kind, Terry
- Circulation, Vol. 127, Issue 13
Using Search Query Surveillance to Monitor Tax Avoidance and Smoking Cessation following the United States' 2009 “SCHIP” Cigarette Tax Increase
journal, March 2011
- Ayers, John W.; Ribisl, Kurt; Brownstein, John S.
- PLoS ONE, Vol. 6, Issue 3
Works referencing / citing this record:
Digital Epidemiology: Use of Digital Data Collected for Non-epidemiological Purposes in Epidemiological Studies
journal, January 2018
- Park, Hyeoun-Ae; Jung, Hyesil; On, Jeongah
- Healthcare Informatics Research, Vol. 24, Issue 4