skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SU-D-204-06: Integration of Machine Learning and Bioinformatics Methods to Analyze Genome-Wide Association Study Data for Rectal Bleeding and Erectile Dysfunction Following Radiotherapy in Prostate Cancer

Journal Article · · Medical Physics
DOI:https://doi.org/10.1118/1.4955611· OSTI ID:22624364
;  [1];  [2];  [3];  [4]
  1. Memorial Sloan Kettering Cancer Center, New York, NY (United States)
  2. University of Rochester Medical Center, Rochester, NY (United States)
  3. Albert Einstein College of Medicine, Bronx, NY (United States)
  4. Mount Sinai School of Medicine, New York, NY (United States)

Purpose: We investigated whether integration of machine learning and bioinformatics techniques on genome-wide association study (GWAS) data can improve the performance of predictive models in predicting the risk of developing radiation-induced late rectal bleeding and erectile dysfunction in prostate cancer patients. Methods: We analyzed a GWAS dataset generated from 385 prostate cancer patients treated with radiotherapy. Using genotype information from these patients, we designed a machine learning-based predictive model of late radiation-induced toxicities: rectal bleeding and erectile dysfunction. The model building process was performed using 2/3 of samples (training) and the predictive model was tested with 1/3 of samples (validation). To identify important single nucleotide polymorphisms (SNPs), we computed the SNP importance score, resulting from our random forest regression model. We performed gene ontology (GO) enrichment analysis for nearby genes of the important SNPs. Results: After univariate analysis on the training dataset, we filtered out many SNPs with p>0.001, resulting in 749 and 367 SNPs that were used in the model building process for rectal bleeding and erectile dysfunction, respectively. On the validation dataset, our random forest regression model achieved the area under the curve (AUC)=0.70 and 0.62 for rectal bleeding and erectile dysfunction, respectively. We performed GO enrichment analysis for the top 25%, 50%, 75%, and 100% SNPs out of the select SNPs in the univariate analysis. When we used the top 50% SNPs, more plausible biological processes were obtained for both toxicities. An additional test with the top 50% SNPs improved predictive power with AUC=0.71 and 0.65 for rectal bleeding and erectile dysfunction. A better performance was achieved with AUC=0.67 when age and androgen deprivation therapy were added to the model for erectile dysfunction. Conclusion: Our approach that combines machine learning and bioinformatics techniques enabled designing better models and identifying more plausible biological processes associated with the outcomes.

OSTI ID:
22624364
Journal Information:
Medical Physics, Vol. 43, Issue 6; Other Information: (c) 2016 American Association of Physicists in Medicine; Country of input: International Atomic Energy Agency (IAEA); ISSN 0094-2405
Country of Publication:
United States
Language:
English