skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SU-D-204-06: Integration of Machine Learning and Bioinformatics Methods to Analyze Genome-Wide Association Study Data for Rectal Bleeding and Erectile Dysfunction Following Radiotherapy in Prostate Cancer

Abstract

Purpose: We investigated whether integration of machine learning and bioinformatics techniques on genome-wide association study (GWAS) data can improve the performance of predictive models in predicting the risk of developing radiation-induced late rectal bleeding and erectile dysfunction in prostate cancer patients. Methods: We analyzed a GWAS dataset generated from 385 prostate cancer patients treated with radiotherapy. Using genotype information from these patients, we designed a machine learning-based predictive model of late radiation-induced toxicities: rectal bleeding and erectile dysfunction. The model building process was performed using 2/3 of samples (training) and the predictive model was tested with 1/3 of samples (validation). To identify important single nucleotide polymorphisms (SNPs), we computed the SNP importance score, resulting from our random forest regression model. We performed gene ontology (GO) enrichment analysis for nearby genes of the important SNPs. Results: After univariate analysis on the training dataset, we filtered out many SNPs with p>0.001, resulting in 749 and 367 SNPs that were used in the model building process for rectal bleeding and erectile dysfunction, respectively. On the validation dataset, our random forest regression model achieved the area under the curve (AUC)=0.70 and 0.62 for rectal bleeding and erectile dysfunction, respectively. We performed GO enrichment analysismore » for the top 25%, 50%, 75%, and 100% SNPs out of the select SNPs in the univariate analysis. When we used the top 50% SNPs, more plausible biological processes were obtained for both toxicities. An additional test with the top 50% SNPs improved predictive power with AUC=0.71 and 0.65 for rectal bleeding and erectile dysfunction. A better performance was achieved with AUC=0.67 when age and androgen deprivation therapy were added to the model for erectile dysfunction. Conclusion: Our approach that combines machine learning and bioinformatics techniques enabled designing better models and identifying more plausible biological processes associated with the outcomes.« less

Authors:
;  [1];  [2];  [3];  [4]
  1. Memorial Sloan Kettering Cancer Center, New York, NY (United States)
  2. University of Rochester Medical Center, Rochester, NY (United States)
  3. Albert Einstein College of Medicine, Bronx, NY (United States)
  4. Mount Sinai School of Medicine, New York, NY (United States)
Publication Date:
OSTI Identifier:
22624364
Resource Type:
Journal Article
Resource Relation:
Journal Name: Medical Physics; Journal Volume: 43; Journal Issue: 6; Other Information: (c) 2016 American Association of Physicists in Medicine; Country of input: International Atomic Energy Agency (IAEA)
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 61 RADIATION PROTECTION AND DOSIMETRY; ANDROGENS; DATASETS; GENOTYPE; HAZARDS; LEARNING; NEOPLASMS; NUCLEOTIDES; PATIENTS; PROSTATE; RADIOTHERAPY; RECTUM; TOXICITY; TRAINING; VALIDATION

Citation Formats

Oh, J, Deasy, J, Kerns, S, Ostrer, H, and Rosenstein, B. SU-D-204-06: Integration of Machine Learning and Bioinformatics Methods to Analyze Genome-Wide Association Study Data for Rectal Bleeding and Erectile Dysfunction Following Radiotherapy in Prostate Cancer. United States: N. p., 2016. Web. doi:10.1118/1.4955611.
Oh, J, Deasy, J, Kerns, S, Ostrer, H, & Rosenstein, B. SU-D-204-06: Integration of Machine Learning and Bioinformatics Methods to Analyze Genome-Wide Association Study Data for Rectal Bleeding and Erectile Dysfunction Following Radiotherapy in Prostate Cancer. United States. doi:10.1118/1.4955611.
Oh, J, Deasy, J, Kerns, S, Ostrer, H, and Rosenstein, B. Wed . "SU-D-204-06: Integration of Machine Learning and Bioinformatics Methods to Analyze Genome-Wide Association Study Data for Rectal Bleeding and Erectile Dysfunction Following Radiotherapy in Prostate Cancer". United States. doi:10.1118/1.4955611.
@article{osti_22624364,
title = {SU-D-204-06: Integration of Machine Learning and Bioinformatics Methods to Analyze Genome-Wide Association Study Data for Rectal Bleeding and Erectile Dysfunction Following Radiotherapy in Prostate Cancer},
author = {Oh, J and Deasy, J and Kerns, S and Ostrer, H and Rosenstein, B},
abstractNote = {Purpose: We investigated whether integration of machine learning and bioinformatics techniques on genome-wide association study (GWAS) data can improve the performance of predictive models in predicting the risk of developing radiation-induced late rectal bleeding and erectile dysfunction in prostate cancer patients. Methods: We analyzed a GWAS dataset generated from 385 prostate cancer patients treated with radiotherapy. Using genotype information from these patients, we designed a machine learning-based predictive model of late radiation-induced toxicities: rectal bleeding and erectile dysfunction. The model building process was performed using 2/3 of samples (training) and the predictive model was tested with 1/3 of samples (validation). To identify important single nucleotide polymorphisms (SNPs), we computed the SNP importance score, resulting from our random forest regression model. We performed gene ontology (GO) enrichment analysis for nearby genes of the important SNPs. Results: After univariate analysis on the training dataset, we filtered out many SNPs with p>0.001, resulting in 749 and 367 SNPs that were used in the model building process for rectal bleeding and erectile dysfunction, respectively. On the validation dataset, our random forest regression model achieved the area under the curve (AUC)=0.70 and 0.62 for rectal bleeding and erectile dysfunction, respectively. We performed GO enrichment analysis for the top 25%, 50%, 75%, and 100% SNPs out of the select SNPs in the univariate analysis. When we used the top 50% SNPs, more plausible biological processes were obtained for both toxicities. An additional test with the top 50% SNPs improved predictive power with AUC=0.71 and 0.65 for rectal bleeding and erectile dysfunction. A better performance was achieved with AUC=0.67 when age and androgen deprivation therapy were added to the model for erectile dysfunction. Conclusion: Our approach that combines machine learning and bioinformatics techniques enabled designing better models and identifying more plausible biological processes associated with the outcomes.},
doi = {10.1118/1.4955611},
journal = {Medical Physics},
number = 6,
volume = 43,
place = {United States},
year = {Wed Jun 15 00:00:00 EDT 2016},
month = {Wed Jun 15 00:00:00 EDT 2016}
}