DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks

Abstract

Background Genome-wide Association Studies (GWAS) aims to uncover the link between genomic variation and phenotype. They have been actively applied in cancer biology to investigate associations between variations and cancer phenotypes, such as susceptibility to certain types of cancer and predisposed responsiveness to specific treatments. Since GWAS primarily focuses on finding associations between individual genomic variations and cancer phenotypes, there are limitations in understanding the mechanisms by which cancer phenotypes are cooperatively affected by more than one genomic variation. Results This paper proposes a network representation learning approach to learn associations among genomic variations using a prostate cancer cohort. The learned associations are encoded into representations that can be used to identify functional modules of genomic variations within genes associated with early- and late-onset prostate cancer. The proposed method was applied to a prostate cancer cohort provided by the Veterans Administration’s Million Veteran Program to identify candidates for functional modules associated with early-onset prostate cancer. The cohort included 33,159 prostate cancer patients, 3181 early-onset patients, and 29,978 late-onset patients. The reproducibility of the proposed approach clearly showed that the proposed approach can improve the model performance in terms of robustness. Conclusions To our knowledge, this is the first attempt tomore » use a network representation learning approach to learn associations among genomic variations within genes. Associations learned in this way can lead to an understanding of the underlying mechanisms of how genomic variations cooperatively affect each cancer phenotype. This method can reveal unknown knowledge in the field of cancer biology and can be utilized to design more advanced cancer-targeted therapies.« less

Authors:
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC); Department of Veterans Affairs, Office of Information Technology
Contributing Org.:
VA Million Veteran Program
OSTI Identifier:
1876318
Grant/Contract Number:  
AC05-00OR22725; VA118-16-M-1062
Resource Type:
Accepted Manuscript
Journal Name:
BMC Medical Genomics
Additional Journal Information:
Journal Volume: 15; Journal Issue: 1; Journal ID: ISSN 1755-8794
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; genome-wide association study; network representation learning; machine learning

Citation Formats

Kim, Minsu, Huffman, Jennifer E., Justice, Amy, Goethert, Ian, Agasthya, Greeshma, Sun, Yan, McArdle, Rachel, Dellitalia, Louis, Stephens, Brady, Cho, Kelly, Pyarajan, Saiju, Mattocks, Kristin, Harley, John, Whittle, Jeffrey, Mathew, Roy, Beckham, Jean, Smith, River, Wells., John, Gutierrez, Salvador, Hammer, Kimberly, Iruvanti, Pran, Ballas, Zuhair, Mastorides, Stephen, Moorman, Jonathan, Gappy, Saib, Klein, Jon, Ratcliffe, Nora, Palacio, Ana, Okusaga, Olaoluwa, Murdoch, Maureen, Sriram, Peruvemba, Argyres, Dean P., Connor, Todd, Villareal, Gerardo, Kinlay, Scott, Yeh, Shing Shing, Jhala, Darshana, Tandon, Neeraj, Chang, Kyong-Mi, Aguayo, Samuel, Cohen, David, Sharma, Satish, Hamner, Mark, Liangpunsakul, Suthat, Godschalk, Michael, Oursler, Kris Ann, Whooley, Mary, Greco, Jennifer, Ahuja, Sunil, Constans, Joseph, Meyer, Paul, Rauchman, Michael, Servatius, Richard, Ramoni, Rachel, Muralidhar, Sumitra, Gaziano, J. Michael, Gaddy, Melinda, Wallbom, Agnes, Norton, James, Morgan, Timothy, Stapley, Todd, Liang, Peter, Bhushan, Sujata, Jacono, Frank, Fujii, Daryl, Tsao, Philip, Humphries, Donald E., Huang, Grant, Breeling, James, Moser, Jennifer, Brewer, Jessica V., Casas, Juan P., Cho, Kelly, Churby, Lori, Selva, Luis E., Brophy, Mary T., Do, Nhan, Tsao, Philip S., Shayan, Shahpoor Alex, Whitbourne, Stacey B., Strollo, Patrick, Boyko, Edward, Walsh, Jessica, Pyarajan, Saiju, Hauser, Elizabeth, DuVall, Scott L., Gupta, Samir, Huq, Mostaqul, Fayad, Joseph, Hung, Adriana, Xu, Junzhe, Alexander, Kathrina, Hurley, Robin, Lichy, Jack, Zhao, Hongyu, Wilson, Peter, Robey, Brooks, Balasubramanian, Prakash, and Danciu, Ioana. Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks. United States: N. p., 2022. Web. doi:10.1186/s12920-022-01298-6.
Kim, Minsu, Huffman, Jennifer E., Justice, Amy, Goethert, Ian, Agasthya, Greeshma, Sun, Yan, McArdle, Rachel, Dellitalia, Louis, Stephens, Brady, Cho, Kelly, Pyarajan, Saiju, Mattocks, Kristin, Harley, John, Whittle, Jeffrey, Mathew, Roy, Beckham, Jean, Smith, River, Wells., John, Gutierrez, Salvador, Hammer, Kimberly, Iruvanti, Pran, Ballas, Zuhair, Mastorides, Stephen, Moorman, Jonathan, Gappy, Saib, Klein, Jon, Ratcliffe, Nora, Palacio, Ana, Okusaga, Olaoluwa, Murdoch, Maureen, Sriram, Peruvemba, Argyres, Dean P., Connor, Todd, Villareal, Gerardo, Kinlay, Scott, Yeh, Shing Shing, Jhala, Darshana, Tandon, Neeraj, Chang, Kyong-Mi, Aguayo, Samuel, Cohen, David, Sharma, Satish, Hamner, Mark, Liangpunsakul, Suthat, Godschalk, Michael, Oursler, Kris Ann, Whooley, Mary, Greco, Jennifer, Ahuja, Sunil, Constans, Joseph, Meyer, Paul, Rauchman, Michael, Servatius, Richard, Ramoni, Rachel, Muralidhar, Sumitra, Gaziano, J. Michael, Gaddy, Melinda, Wallbom, Agnes, Norton, James, Morgan, Timothy, Stapley, Todd, Liang, Peter, Bhushan, Sujata, Jacono, Frank, Fujii, Daryl, Tsao, Philip, Humphries, Donald E., Huang, Grant, Breeling, James, Moser, Jennifer, Brewer, Jessica V., Casas, Juan P., Cho, Kelly, Churby, Lori, Selva, Luis E., Brophy, Mary T., Do, Nhan, Tsao, Philip S., Shayan, Shahpoor Alex, Whitbourne, Stacey B., Strollo, Patrick, Boyko, Edward, Walsh, Jessica, Pyarajan, Saiju, Hauser, Elizabeth, DuVall, Scott L., Gupta, Samir, Huq, Mostaqul, Fayad, Joseph, Hung, Adriana, Xu, Junzhe, Alexander, Kathrina, Hurley, Robin, Lichy, Jack, Zhao, Hongyu, Wilson, Peter, Robey, Brooks, Balasubramanian, Prakash, & Danciu, Ioana. Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks. United States. https://doi.org/10.1186/s12920-022-01298-6
Kim, Minsu, Huffman, Jennifer E., Justice, Amy, Goethert, Ian, Agasthya, Greeshma, Sun, Yan, McArdle, Rachel, Dellitalia, Louis, Stephens, Brady, Cho, Kelly, Pyarajan, Saiju, Mattocks, Kristin, Harley, John, Whittle, Jeffrey, Mathew, Roy, Beckham, Jean, Smith, River, Wells., John, Gutierrez, Salvador, Hammer, Kimberly, Iruvanti, Pran, Ballas, Zuhair, Mastorides, Stephen, Moorman, Jonathan, Gappy, Saib, Klein, Jon, Ratcliffe, Nora, Palacio, Ana, Okusaga, Olaoluwa, Murdoch, Maureen, Sriram, Peruvemba, Argyres, Dean P., Connor, Todd, Villareal, Gerardo, Kinlay, Scott, Yeh, Shing Shing, Jhala, Darshana, Tandon, Neeraj, Chang, Kyong-Mi, Aguayo, Samuel, Cohen, David, Sharma, Satish, Hamner, Mark, Liangpunsakul, Suthat, Godschalk, Michael, Oursler, Kris Ann, Whooley, Mary, Greco, Jennifer, Ahuja, Sunil, Constans, Joseph, Meyer, Paul, Rauchman, Michael, Servatius, Richard, Ramoni, Rachel, Muralidhar, Sumitra, Gaziano, J. Michael, Gaddy, Melinda, Wallbom, Agnes, Norton, James, Morgan, Timothy, Stapley, Todd, Liang, Peter, Bhushan, Sujata, Jacono, Frank, Fujii, Daryl, Tsao, Philip, Humphries, Donald E., Huang, Grant, Breeling, James, Moser, Jennifer, Brewer, Jessica V., Casas, Juan P., Cho, Kelly, Churby, Lori, Selva, Luis E., Brophy, Mary T., Do, Nhan, Tsao, Philip S., Shayan, Shahpoor Alex, Whitbourne, Stacey B., Strollo, Patrick, Boyko, Edward, Walsh, Jessica, Pyarajan, Saiju, Hauser, Elizabeth, DuVall, Scott L., Gupta, Samir, Huq, Mostaqul, Fayad, Joseph, Hung, Adriana, Xu, Junzhe, Alexander, Kathrina, Hurley, Robin, Lichy, Jack, Zhao, Hongyu, Wilson, Peter, Robey, Brooks, Balasubramanian, Prakash, and Danciu, Ioana. Wed . "Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks". United States. https://doi.org/10.1186/s12920-022-01298-6. https://www.osti.gov/servlets/purl/1876318.
@article{osti_1876318,
title = {Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks},
author = {Kim, Minsu and Huffman, Jennifer E. and Justice, Amy and Goethert, Ian and Agasthya, Greeshma and Sun, Yan and McArdle, Rachel and Dellitalia, Louis and Stephens, Brady and Cho, Kelly and Pyarajan, Saiju and Mattocks, Kristin and Harley, John and Whittle, Jeffrey and Mathew, Roy and Beckham, Jean and Smith, River and Wells., John and Gutierrez, Salvador and Hammer, Kimberly and Iruvanti, Pran and Ballas, Zuhair and Mastorides, Stephen and Moorman, Jonathan and Gappy, Saib and Klein, Jon and Ratcliffe, Nora and Palacio, Ana and Okusaga, Olaoluwa and Murdoch, Maureen and Sriram, Peruvemba and Argyres, Dean P. and Connor, Todd and Villareal, Gerardo and Kinlay, Scott and Yeh, Shing Shing and Jhala, Darshana and Tandon, Neeraj and Chang, Kyong-Mi and Aguayo, Samuel and Cohen, David and Sharma, Satish and Hamner, Mark and Liangpunsakul, Suthat and Godschalk, Michael and Oursler, Kris Ann and Whooley, Mary and Greco, Jennifer and Ahuja, Sunil and Constans, Joseph and Meyer, Paul and Rauchman, Michael and Servatius, Richard and Ramoni, Rachel and Muralidhar, Sumitra and Gaziano, J. Michael and Gaddy, Melinda and Wallbom, Agnes and Norton, James and Morgan, Timothy and Stapley, Todd and Liang, Peter and Bhushan, Sujata and Jacono, Frank and Fujii, Daryl and Tsao, Philip and Humphries, Donald E. and Huang, Grant and Breeling, James and Moser, Jennifer and Brewer, Jessica V. and Casas, Juan P. and Cho, Kelly and Churby, Lori and Selva, Luis E. and Brophy, Mary T. and Do, Nhan and Tsao, Philip S. and Shayan, Shahpoor Alex and Whitbourne, Stacey B. and Strollo, Patrick and Boyko, Edward and Walsh, Jessica and Pyarajan, Saiju and Hauser, Elizabeth and DuVall, Scott L. and Gupta, Samir and Huq, Mostaqul and Fayad, Joseph and Hung, Adriana and Xu, Junzhe and Alexander, Kathrina and Hurley, Robin and Lichy, Jack and Zhao, Hongyu and Wilson, Peter and Robey, Brooks and Balasubramanian, Prakash and Danciu, Ioana},
abstractNote = {Background Genome-wide Association Studies (GWAS) aims to uncover the link between genomic variation and phenotype. They have been actively applied in cancer biology to investigate associations between variations and cancer phenotypes, such as susceptibility to certain types of cancer and predisposed responsiveness to specific treatments. Since GWAS primarily focuses on finding associations between individual genomic variations and cancer phenotypes, there are limitations in understanding the mechanisms by which cancer phenotypes are cooperatively affected by more than one genomic variation. Results This paper proposes a network representation learning approach to learn associations among genomic variations using a prostate cancer cohort. The learned associations are encoded into representations that can be used to identify functional modules of genomic variations within genes associated with early- and late-onset prostate cancer. The proposed method was applied to a prostate cancer cohort provided by the Veterans Administration’s Million Veteran Program to identify candidates for functional modules associated with early-onset prostate cancer. The cohort included 33,159 prostate cancer patients, 3181 early-onset patients, and 29,978 late-onset patients. The reproducibility of the proposed approach clearly showed that the proposed approach can improve the model performance in terms of robustness. Conclusions To our knowledge, this is the first attempt to use a network representation learning approach to learn associations among genomic variations within genes. Associations learned in this way can lead to an understanding of the underlying mechanisms of how genomic variations cooperatively affect each cancer phenotype. This method can reveal unknown knowledge in the field of cancer biology and can be utilized to design more advanced cancer-targeted therapies.},
doi = {10.1186/s12920-022-01298-6},
journal = {BMC Medical Genomics},
number = 1,
volume = 15,
place = {United States},
year = {Wed Jul 06 00:00:00 EDT 2022},
month = {Wed Jul 06 00:00:00 EDT 2022}
}

Works referenced in this record:

Interaction between Genetic Variations in DNA Repair Genes and Plasma Folate on Breast Cancer Risk
journal, April 2004

  • Han, Jiali; Hankinson, Susan E.; Zhang, Shumin M.
  • Cancer Epidemiology, Biomarkers & Prevention, Vol. 13, Issue 4
  • DOI: 10.1158/1055-9965.520.13.4

Hierarchical clustering schemes
journal, September 1967


Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
journal, November 1987


Million Veteran Program: A mega-biobank to study genetic influences on health and disease
journal, February 2016


Principal component analysis
journal, August 1987

  • Wold, Svante; Esbensen, Kim; Geladi, Paul
  • Chemometrics and Intelligent Laboratory Systems, Vol. 2, Issue 1-3
  • DOI: 10.1016/0169-7439(87)80084-9

On the Interpretation of χ 2 from Contingency Tables, and the Calculation of P
journal, January 1922

  • Fisher, R. A.
  • Journal of the Royal Statistical Society, Vol. 85, Issue 1
  • DOI: 10.2307/2340521

Plasma Hsp90 levels in patients with systemic sclerosis and relation to lung and skin involvement: a cross-sectional and longitudinal study
journal, January 2021


A Knowledge Network-Based Approach to Facilitate Annotation of Clinical Pathway Component Clusters
conference, July 2021

  • Hasan, S. M. Shamimul; Kim, Minsu; Park, Byung H.
  • 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)
  • DOI: 10.1109/BHI50953.2021.9508508

Impact of Natural Genetic Variation on Gene Expression Dynamics
journal, June 2013


Glove: Global Vectors for Word Representation
conference, January 2014

  • Pennington, Jeffrey; Socher, Richard; Manning, Christopher
  • Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  • DOI: 10.3115/v1/D14-1162

Information theoretic sub-network mining characterizes breast cancer subtypes in terms of cancer core mechanisms
journal, October 2016

  • Park, Jinwoo; Hur, Benjamin; Rhee, Sungmin
  • Journal of Bioinformatics and Computational Biology, Vol. 14, Issue 05
  • DOI: 10.1142/S0219720016440029

Genome-wide association studies of cancer: current insights and future perspectives
journal, October 2017

  • Sud, Amit; Kinnersley, Ben; Houlston, Richard S.
  • Nature Reviews Cancer, Vol. 17, Issue 11
  • DOI: 10.1038/nrc.2017.82

Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies
journal, June 2019

  • Joiret, Marc; Mahachie John, Jestinah M.; Gusareva, Elena S.
  • BioData Mining, Vol. 12, Issue 1
  • DOI: 10.1186/s13040-019-0199-7

DeepWalk: online learning of social representations
conference, January 2014

  • Perozzi, Bryan; Al-Rfou, Rami; Skiena, Steven
  • Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14
  • DOI: 10.1145/2623330.2623732

Annual report to the nation on the status of cancer, part I: National cancer statistics
journal, March 2020

  • Henley, S. Jane; Ward, Elizabeth M.; Scott, Susan
  • Cancer, Vol. 126, Issue 10
  • DOI: 10.1002/cncr.32802

Linkage disequilibrium in the human genome
journal, May 2001

  • Reich, David E.; Cargill, Michele; Bolk, Stacey
  • Nature, Vol. 411, Issue 6834
  • DOI: 10.1038/35075590

Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework
journal, December 2017


The Distribution of the Flora in the Alpine Zone.1
journal, February 1912


Learning the parts of objects by non-negative matrix factorization
journal, October 1999

  • Lee, Daniel D.; Seung, H. Sebastian
  • Nature, Vol. 401, Issue 6755
  • DOI: 10.1038/44565

From variant to function in human disease genetics
journal, September 2021


Prostate cancer in young men: an important clinical entity
journal, May 2014

  • Salinas, Claudia A.; Tsodikov, Alex; Ishak-Howard, Miriam
  • Nature Reviews Urology, Vol. 11, Issue 6
  • DOI: 10.1038/nrurol.2014.91

Transcriptomics Signature from Next-Generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer
journal, January 2019


Inferences about Linkage Disequilibrium
journal, March 1979


The Art of Data Augmentation
journal, March 2001

  • van Dyk, David A.; Meng, Xiao-Li
  • Journal of Computational and Graphical Statistics, Vol. 10, Issue 1
  • DOI: 10.1198/10618600152418584