Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Prediction of inherited genomic susceptibility to 20 common cancer types by a supervised machine-learning method

Journal Article · · Proceedings of the National Academy of Sciences of the United States of America
 [1];  [2]
  1. Univ. of California, Berkeley, CA (United States). Dept. of Chemistry; Yonsei Univ., Seoul (Korea). Dept. of Integrative Omics for Biomedical Sciences; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Molecular Biophysics and Integrated Bioimaging Division; DOE/OSTI
  2. Univ. of California, Berkeley, CA (United States). Dept. of Chemistry and Center for Computational Biology; Yonsei Univ., Seoul (Korea). Dept. of Integrative Omics for Biomedical Sciences; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Molecular Biophysics and Integrated Bioimaging Division
Prevention and early intervention are the most effective ways of avoiding or minimizing psychological, physical, and financial suffering from cancer. However, such proactive action requires the ability to predict the individual’s susceptibility to cancer with a measure of probability. Of the triad of cancer-causing factors (inherited genomic susceptibility, environmental factors, and lifestyle factors), the inherited genomic component may be derivable from the recent public availability of a large body of whole-genome variation data. However, genome-wide association studies have so far showed limited success in predicting the inherited susceptibility to common cancers. We present here a multiple classification approach for predicting individuals’ inherited genomic susceptibility to acquire the most likely phenotype among a panel of 20 major common cancer types plus 1 “healthy” type by application of a supervised machine-learning method under competing conditions among the cohorts of the 21 types. This approach suggests that, depending on the phenotypes of 5,919 individuals of “white” ethnic population in this study, (i) the portion of the cohort of a cancer type who acquired the observed type due to mostly inherited genomic susceptibility factors ranges from about 33 to 88% (or its corollary: the portion due to mostly environmental and lifestyle factors ranges from 12 to 67%), and (ii) on an individual level, the method also predicts individuals’ inherited genomic susceptibility to acquire the other types ranked with associated probabilities. These probabilities may provide practical information for individuals, heath professionals, and health policymakers related to prevention and/or early intervention of cancer.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
Sponsoring Organization:
Ministry of Education, Science and Technology; USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1625007
Journal Information:
Proceedings of the National Academy of Sciences of the United States of America, Journal Name: Proceedings of the National Academy of Sciences of the United States of America Journal Issue: 6 Vol. 115; ISSN 0027-8424
Publisher:
National Academy of SciencesCopyright Statement
Country of Publication:
United States
Language:
English

References (24)

Diagnosis and Management of Hereditary Phaeochromocytoma and Paraganglioma book January 2016
Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms journal January 2011
Hallmarks of Cancer: The Next Generation journal March 2011
Genes mirror geography within Europe journal August 2008
Genes mirror geography within Europe journal November 2008
A global reference for human genetic variation journal January 2015
The Cancer Genome Atlas Pan-Cancer analysis project journal September 2013
Common variation and heritability estimates for breast, ovarian and prostate cancers journal March 2013
Data quality control in genetic case-control association studies journal August 2010
Linkage disequilibrium — understanding the evolutionary past and mapping the medical future journal June 2008
A synthetic energy dataset for non-intrusive load monitoring in households journal April 2020
Genomewide Association Studies and Assessment of the Risk of Disease journal July 2010
Empirical prediction of genomic susceptibilities for multiple cancer classes journal January 2014
Polygenic Modeling of Genome-Wide Association Studies: An Application to Prostate and Breast Cancer journal June 2011
Robust relationship inference in genome-wide association studies journal October 2010
Divergence measures based on the Shannon entropy journal January 1991
Second-generation PLINK: rising to the challenge of larger and richer datasets journal February 2015
Staging the Tumor and Staging the Host: Pretreatment Combined Neutrophil Lymphocyte Ratio and Modified Glasgow Prognostic Score Is Associated with Overall Survival in Patients with Esophagogastric Cancers Undergoing Treatment with Curative Intent journal September 2020
A global reference for human genetic variation text January 2015
The cancer genome atlas pan-cancer analysis project text January 2013
A global reference for human genetic variation text January 2015
ROC analysis of classifiers in machine learning: A survey journal May 2013
Cancer Statistics, 2008 journal January 2008
Second-generation PLINK: rising to the challenge of larger and richer datasets text January 2014

Cited By (5)

Collective behavior of artificial intelligence population: transition from optimization to game journal January 2019
The impact of artificial intelligence on the current and future practice of clinical cancer genomics journal January 2019
Cancer classification of single-cell gene expression data by neural network journal October 2019
Identifying genetic determinants of complex phenotypes from whole genome sequence data journal June 2019
Classification of Kidney Cancer Data Using Cost-Sensitive Hybrid Deep Learning Approach journal January 2020