Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks
Abstract
Background Genome-wide Association Studies (GWAS) aims to uncover the link between genomic variation and phenotype. They have been actively applied in cancer biology to investigate associations between variations and cancer phenotypes, such as susceptibility to certain types of cancer and predisposed responsiveness to specific treatments. Since GWAS primarily focuses on finding associations between individual genomic variations and cancer phenotypes, there are limitations in understanding the mechanisms by which cancer phenotypes are cooperatively affected by more than one genomic variation. Results This paper proposes a network representation learning approach to learn associations among genomic variations using a prostate cancer cohort. The learned associations are encoded into representations that can be used to identify functional modules of genomic variations within genes associated with early- and late-onset prostate cancer. The proposed method was applied to a prostate cancer cohort provided by the Veterans Administration’s Million Veteran Program to identify candidates for functional modules associated with early-onset prostate cancer. The cohort included 33,159 prostate cancer patients, 3181 early-onset patients, and 29,978 late-onset patients. The reproducibility of the proposed approach clearly showed that the proposed approach can improve the model performance in terms of robustness. Conclusions To our knowledge, this is the first attempt tomore »
- Authors:
- more »
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC); Department of Veterans Affairs, Office of Information Technology
- Contributing Org.:
- VA Million Veteran Program
- OSTI Identifier:
- 1876318
- Grant/Contract Number:
- AC05-00OR22725; VA118-16-M-1062
- Resource Type:
- Accepted Manuscript
- Journal Name:
- BMC Medical Genomics
- Additional Journal Information:
- Journal Volume: 15; Journal Issue: 1; Journal ID: ISSN 1755-8794
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 60 APPLIED LIFE SCIENCES; genome-wide association study; network representation learning; machine learning
Citation Formats
Kim, Minsu, Huffman, Jennifer E., Justice, Amy, Goethert, Ian, Agasthya, Greeshma, Sun, Yan, McArdle, Rachel, Dellitalia, Louis, Stephens, Brady, Cho, Kelly, Pyarajan, Saiju, Mattocks, Kristin, Harley, John, Whittle, Jeffrey, Mathew, Roy, Beckham, Jean, Smith, River, Wells., John, Gutierrez, Salvador, Hammer, Kimberly, Iruvanti, Pran, Ballas, Zuhair, Mastorides, Stephen, Moorman, Jonathan, Gappy, Saib, Klein, Jon, Ratcliffe, Nora, Palacio, Ana, Okusaga, Olaoluwa, Murdoch, Maureen, Sriram, Peruvemba, Argyres, Dean P., Connor, Todd, Villareal, Gerardo, Kinlay, Scott, Yeh, Shing Shing, Jhala, Darshana, Tandon, Neeraj, Chang, Kyong-Mi, Aguayo, Samuel, Cohen, David, Sharma, Satish, Hamner, Mark, Liangpunsakul, Suthat, Godschalk, Michael, Oursler, Kris Ann, Whooley, Mary, Greco, Jennifer, Ahuja, Sunil, Constans, Joseph, Meyer, Paul, Rauchman, Michael, Servatius, Richard, Ramoni, Rachel, Muralidhar, Sumitra, Gaziano, J. Michael, Gaddy, Melinda, Wallbom, Agnes, Norton, James, Morgan, Timothy, Stapley, Todd, Liang, Peter, Bhushan, Sujata, Jacono, Frank, Fujii, Daryl, Tsao, Philip, Humphries, Donald E., Huang, Grant, Breeling, James, Moser, Jennifer, Brewer, Jessica V., Casas, Juan P., Cho, Kelly, Churby, Lori, Selva, Luis E., Brophy, Mary T., Do, Nhan, Tsao, Philip S., Shayan, Shahpoor Alex, Whitbourne, Stacey B., Strollo, Patrick, Boyko, Edward, Walsh, Jessica, Pyarajan, Saiju, Hauser, Elizabeth, DuVall, Scott L., Gupta, Samir, Huq, Mostaqul, Fayad, Joseph, Hung, Adriana, Xu, Junzhe, Alexander, Kathrina, Hurley, Robin, Lichy, Jack, Zhao, Hongyu, Wilson, Peter, Robey, Brooks, Balasubramanian, Prakash, and Danciu, Ioana. Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks. United States: N. p., 2022.
Web. doi:10.1186/s12920-022-01298-6.
Kim, Minsu, Huffman, Jennifer E., Justice, Amy, Goethert, Ian, Agasthya, Greeshma, Sun, Yan, McArdle, Rachel, Dellitalia, Louis, Stephens, Brady, Cho, Kelly, Pyarajan, Saiju, Mattocks, Kristin, Harley, John, Whittle, Jeffrey, Mathew, Roy, Beckham, Jean, Smith, River, Wells., John, Gutierrez, Salvador, Hammer, Kimberly, Iruvanti, Pran, Ballas, Zuhair, Mastorides, Stephen, Moorman, Jonathan, Gappy, Saib, Klein, Jon, Ratcliffe, Nora, Palacio, Ana, Okusaga, Olaoluwa, Murdoch, Maureen, Sriram, Peruvemba, Argyres, Dean P., Connor, Todd, Villareal, Gerardo, Kinlay, Scott, Yeh, Shing Shing, Jhala, Darshana, Tandon, Neeraj, Chang, Kyong-Mi, Aguayo, Samuel, Cohen, David, Sharma, Satish, Hamner, Mark, Liangpunsakul, Suthat, Godschalk, Michael, Oursler, Kris Ann, Whooley, Mary, Greco, Jennifer, Ahuja, Sunil, Constans, Joseph, Meyer, Paul, Rauchman, Michael, Servatius, Richard, Ramoni, Rachel, Muralidhar, Sumitra, Gaziano, J. Michael, Gaddy, Melinda, Wallbom, Agnes, Norton, James, Morgan, Timothy, Stapley, Todd, Liang, Peter, Bhushan, Sujata, Jacono, Frank, Fujii, Daryl, Tsao, Philip, Humphries, Donald E., Huang, Grant, Breeling, James, Moser, Jennifer, Brewer, Jessica V., Casas, Juan P., Cho, Kelly, Churby, Lori, Selva, Luis E., Brophy, Mary T., Do, Nhan, Tsao, Philip S., Shayan, Shahpoor Alex, Whitbourne, Stacey B., Strollo, Patrick, Boyko, Edward, Walsh, Jessica, Pyarajan, Saiju, Hauser, Elizabeth, DuVall, Scott L., Gupta, Samir, Huq, Mostaqul, Fayad, Joseph, Hung, Adriana, Xu, Junzhe, Alexander, Kathrina, Hurley, Robin, Lichy, Jack, Zhao, Hongyu, Wilson, Peter, Robey, Brooks, Balasubramanian, Prakash, & Danciu, Ioana. Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks. United States. https://doi.org/10.1186/s12920-022-01298-6
Kim, Minsu, Huffman, Jennifer E., Justice, Amy, Goethert, Ian, Agasthya, Greeshma, Sun, Yan, McArdle, Rachel, Dellitalia, Louis, Stephens, Brady, Cho, Kelly, Pyarajan, Saiju, Mattocks, Kristin, Harley, John, Whittle, Jeffrey, Mathew, Roy, Beckham, Jean, Smith, River, Wells., John, Gutierrez, Salvador, Hammer, Kimberly, Iruvanti, Pran, Ballas, Zuhair, Mastorides, Stephen, Moorman, Jonathan, Gappy, Saib, Klein, Jon, Ratcliffe, Nora, Palacio, Ana, Okusaga, Olaoluwa, Murdoch, Maureen, Sriram, Peruvemba, Argyres, Dean P., Connor, Todd, Villareal, Gerardo, Kinlay, Scott, Yeh, Shing Shing, Jhala, Darshana, Tandon, Neeraj, Chang, Kyong-Mi, Aguayo, Samuel, Cohen, David, Sharma, Satish, Hamner, Mark, Liangpunsakul, Suthat, Godschalk, Michael, Oursler, Kris Ann, Whooley, Mary, Greco, Jennifer, Ahuja, Sunil, Constans, Joseph, Meyer, Paul, Rauchman, Michael, Servatius, Richard, Ramoni, Rachel, Muralidhar, Sumitra, Gaziano, J. Michael, Gaddy, Melinda, Wallbom, Agnes, Norton, James, Morgan, Timothy, Stapley, Todd, Liang, Peter, Bhushan, Sujata, Jacono, Frank, Fujii, Daryl, Tsao, Philip, Humphries, Donald E., Huang, Grant, Breeling, James, Moser, Jennifer, Brewer, Jessica V., Casas, Juan P., Cho, Kelly, Churby, Lori, Selva, Luis E., Brophy, Mary T., Do, Nhan, Tsao, Philip S., Shayan, Shahpoor Alex, Whitbourne, Stacey B., Strollo, Patrick, Boyko, Edward, Walsh, Jessica, Pyarajan, Saiju, Hauser, Elizabeth, DuVall, Scott L., Gupta, Samir, Huq, Mostaqul, Fayad, Joseph, Hung, Adriana, Xu, Junzhe, Alexander, Kathrina, Hurley, Robin, Lichy, Jack, Zhao, Hongyu, Wilson, Peter, Robey, Brooks, Balasubramanian, Prakash, and Danciu, Ioana. Wed .
"Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks". United States. https://doi.org/10.1186/s12920-022-01298-6. https://www.osti.gov/servlets/purl/1876318.
@article{osti_1876318,
title = {Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks},
author = {Kim, Minsu and Huffman, Jennifer E. and Justice, Amy and Goethert, Ian and Agasthya, Greeshma and Sun, Yan and McArdle, Rachel and Dellitalia, Louis and Stephens, Brady and Cho, Kelly and Pyarajan, Saiju and Mattocks, Kristin and Harley, John and Whittle, Jeffrey and Mathew, Roy and Beckham, Jean and Smith, River and Wells., John and Gutierrez, Salvador and Hammer, Kimberly and Iruvanti, Pran and Ballas, Zuhair and Mastorides, Stephen and Moorman, Jonathan and Gappy, Saib and Klein, Jon and Ratcliffe, Nora and Palacio, Ana and Okusaga, Olaoluwa and Murdoch, Maureen and Sriram, Peruvemba and Argyres, Dean P. and Connor, Todd and Villareal, Gerardo and Kinlay, Scott and Yeh, Shing Shing and Jhala, Darshana and Tandon, Neeraj and Chang, Kyong-Mi and Aguayo, Samuel and Cohen, David and Sharma, Satish and Hamner, Mark and Liangpunsakul, Suthat and Godschalk, Michael and Oursler, Kris Ann and Whooley, Mary and Greco, Jennifer and Ahuja, Sunil and Constans, Joseph and Meyer, Paul and Rauchman, Michael and Servatius, Richard and Ramoni, Rachel and Muralidhar, Sumitra and Gaziano, J. Michael and Gaddy, Melinda and Wallbom, Agnes and Norton, James and Morgan, Timothy and Stapley, Todd and Liang, Peter and Bhushan, Sujata and Jacono, Frank and Fujii, Daryl and Tsao, Philip and Humphries, Donald E. and Huang, Grant and Breeling, James and Moser, Jennifer and Brewer, Jessica V. and Casas, Juan P. and Cho, Kelly and Churby, Lori and Selva, Luis E. and Brophy, Mary T. and Do, Nhan and Tsao, Philip S. and Shayan, Shahpoor Alex and Whitbourne, Stacey B. and Strollo, Patrick and Boyko, Edward and Walsh, Jessica and Pyarajan, Saiju and Hauser, Elizabeth and DuVall, Scott L. and Gupta, Samir and Huq, Mostaqul and Fayad, Joseph and Hung, Adriana and Xu, Junzhe and Alexander, Kathrina and Hurley, Robin and Lichy, Jack and Zhao, Hongyu and Wilson, Peter and Robey, Brooks and Balasubramanian, Prakash and Danciu, Ioana},
abstractNote = {Background Genome-wide Association Studies (GWAS) aims to uncover the link between genomic variation and phenotype. They have been actively applied in cancer biology to investigate associations between variations and cancer phenotypes, such as susceptibility to certain types of cancer and predisposed responsiveness to specific treatments. Since GWAS primarily focuses on finding associations between individual genomic variations and cancer phenotypes, there are limitations in understanding the mechanisms by which cancer phenotypes are cooperatively affected by more than one genomic variation. Results This paper proposes a network representation learning approach to learn associations among genomic variations using a prostate cancer cohort. The learned associations are encoded into representations that can be used to identify functional modules of genomic variations within genes associated with early- and late-onset prostate cancer. The proposed method was applied to a prostate cancer cohort provided by the Veterans Administration’s Million Veteran Program to identify candidates for functional modules associated with early-onset prostate cancer. The cohort included 33,159 prostate cancer patients, 3181 early-onset patients, and 29,978 late-onset patients. The reproducibility of the proposed approach clearly showed that the proposed approach can improve the model performance in terms of robustness. Conclusions To our knowledge, this is the first attempt to use a network representation learning approach to learn associations among genomic variations within genes. Associations learned in this way can lead to an understanding of the underlying mechanisms of how genomic variations cooperatively affect each cancer phenotype. This method can reveal unknown knowledge in the field of cancer biology and can be utilized to design more advanced cancer-targeted therapies.},
doi = {10.1186/s12920-022-01298-6},
journal = {BMC Medical Genomics},
number = 1,
volume = 15,
place = {United States},
year = {Wed Jul 06 00:00:00 EDT 2022},
month = {Wed Jul 06 00:00:00 EDT 2022}
}
Works referenced in this record:
Interaction between Genetic Variations in DNA Repair Genes and Plasma Folate on Breast Cancer Risk
journal, April 2004
- Han, Jiali; Hankinson, Susan E.; Zhang, Shumin M.
- Cancer Epidemiology, Biomarkers & Prevention, Vol. 13, Issue 4
Hierarchical clustering schemes
journal, September 1967
- Johnson, Stephen C.
- Psychometrika, Vol. 32, Issue 3
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
journal, November 1987
- Rousseeuw, Peter J.
- Journal of Computational and Applied Mathematics, Vol. 20
Million Veteran Program: A mega-biobank to study genetic influences on health and disease
journal, February 2016
- Gaziano, John Michael; Concato, John; Brophy, Mary
- Journal of Clinical Epidemiology, Vol. 70
Principal component analysis
journal, August 1987
- Wold, Svante; Esbensen, Kim; Geladi, Paul
- Chemometrics and Intelligent Laboratory Systems, Vol. 2, Issue 1-3
On the Interpretation of χ 2 from Contingency Tables, and the Calculation of P
journal, January 1922
- Fisher, R. A.
- Journal of the Royal Statistical Society, Vol. 85, Issue 1
Plasma Hsp90 levels in patients with systemic sclerosis and relation to lung and skin involvement: a cross-sectional and longitudinal study
journal, January 2021
- Štorkánová, Hana; Oreská, Sabína; Špiritović, Maja
- Scientific Reports, Vol. 11, Issue 1
A Knowledge Network-Based Approach to Facilitate Annotation of Clinical Pathway Component Clusters
conference, July 2021
- Hasan, S. M. Shamimul; Kim, Minsu; Park, Byung H.
- 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)
Protein interaction network (PIN)-based breast cancer subsystem identification and activation measurement for prognostic modeling
journal, November 2016
- Lim, S.; Park, Y.; Hur, B.
- Methods, Vol. 110
Impact of Natural Genetic Variation on Gene Expression Dynamics
journal, June 2013
- Ackermann, Marit; Sikora-Wohlfeld, Weronika; Beyer, Andreas
- PLoS Genetics, Vol. 9, Issue 6
Glove: Global Vectors for Word Representation
conference, January 2014
- Pennington, Jeffrey; Socher, Richard; Manning, Christopher
- Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Information theoretic sub-network mining characterizes breast cancer subtypes in terms of cancer core mechanisms
journal, October 2016
- Park, Jinwoo; Hur, Benjamin; Rhee, Sungmin
- Journal of Bioinformatics and Computational Biology, Vol. 14, Issue 05
Genome-wide association studies of cancer: current insights and future perspectives
journal, October 2017
- Sud, Amit; Kinnersley, Ben; Houlston, Richard S.
- Nature Reviews Cancer, Vol. 17, Issue 11
Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies
journal, June 2019
- Joiret, Marc; Mahachie John, Jestinah M.; Gusareva, Elena S.
- BioData Mining, Vol. 12, Issue 1
DeepWalk: online learning of social representations
conference, January 2014
- Perozzi, Bryan; Al-Rfou, Rami; Skiena, Steven
- Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14
Annual report to the nation on the status of cancer, part I: National cancer statistics
journal, March 2020
- Henley, S. Jane; Ward, Elizabeth M.; Scott, Susan
- Cancer, Vol. 126, Issue 10
Linkage disequilibrium in the human genome
journal, May 2001
- Reich, David E.; Cargill, Michele; Bolk, Stacey
- Nature, Vol. 411, Issue 6834
Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework
journal, December 2017
- Glusman, Gustavo; Rose, Peter W.; Prlić, Andreas
- Genome Medicine, Vol. 9, Issue 1
The Distribution of the Flora in the Alpine Zone.1
journal, February 1912
- Jaccard, Paul
- New Phytologist, Vol. 11, Issue 2
Learning the parts of objects by non-negative matrix factorization
journal, October 1999
- Lee, Daniel D.; Seung, H. Sebastian
- Nature, Vol. 401, Issue 6755
From variant to function in human disease genetics
journal, September 2021
- Lappalainen, Tuuli; MacArthur, Daniel G.
- Science, Vol. 373, Issue 6562
Prostate cancer in young men: an important clinical entity
journal, May 2014
- Salinas, Claudia A.; Tsodikov, Alex; Ishak-Howard, Miriam
- Nature Reviews Urology, Vol. 11, Issue 6
Transcriptomics Signature from Next-Generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer
journal, January 2019
- Alkhateeb, Abedalrhman; Rezaeian, Iman; Singireddy, Siva
- Cancer Informatics, Vol. 18
Inferences about Linkage Disequilibrium
journal, March 1979
- Weir, B. S.
- Biometrics, Vol. 35, Issue 1
The Art of Data Augmentation
journal, March 2001
- van Dyk, David A.; Meng, Xiao-Li
- Journal of Computational and Graphical Statistics, Vol. 10, Issue 1