Deep graph representations embed network information for robust disease marker identification
Abstract
We report that the accurate disease diagnosis and prognosis based on omics data rely on the effective identification of robust prognostic and diagnostic markers that reflect the states of the biological processes underlying the disease pathogenesis and progression. In this article, we present GCNCC, a Graph Convolutional Network-based approach for Clustering and Classification, that can identify highly effective and robust network-based disease markers. Based on a geometric deep learning framework, GCNCC learns deep network representations by integrating gene expression data with protein interaction data to identify highly reproducible markers with consistently accurate prediction performance across independent datasets possibly from different platforms. GCNCC identifies these markers by clustering the nodes in the protein interaction network based on latent similarity measures learned by the deep architecture of a graph convolutional network, followed by a supervised feature selection procedure that extracts clusters that are highly predictive of the disease state. By benchmarking GCNCC based on independent datasets from different diseases (psychiatric disorder and cancer) and different platforms (microarray and RNA-seq), we show that GCNCC outperforms other state-of-the-art methods in terms of accuracy and reproducibility.
- Authors:
-
- Texas A & M Univ., College Station, TX (United States)
- Texas A & M Univ., College Station, TX (United States); Brookhaven National Lab. (BNL), Upton, NY (United States)
- Publication Date:
- Research Org.:
- Brookhaven National Lab. (BNL), Upton, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1855099
- Report Number(s):
- BNL-222809-2022-JAAM
Journal ID: ISSN 1367-4803
- Grant/Contract Number:
- SC0012704
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Bioinformatics
- Additional Journal Information:
- Journal Volume: 38; Journal Issue: 4; Journal ID: ISSN 1367-4803
- Publisher:
- International Society for Computational Biology - Oxford University Press
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Maddouri, Omar, Qian, Xiaoning, and Yoon, Byung-Jun. Deep graph representations embed network information for robust disease marker identification. United States: N. p., 2021.
Web. doi:10.1093/bioinformatics/btab772.
Maddouri, Omar, Qian, Xiaoning, & Yoon, Byung-Jun. Deep graph representations embed network information for robust disease marker identification. United States. https://doi.org/10.1093/bioinformatics/btab772
Maddouri, Omar, Qian, Xiaoning, and Yoon, Byung-Jun. Thu .
"Deep graph representations embed network information for robust disease marker identification". United States. https://doi.org/10.1093/bioinformatics/btab772. https://www.osti.gov/servlets/purl/1855099.
@article{osti_1855099,
title = {Deep graph representations embed network information for robust disease marker identification},
author = {Maddouri, Omar and Qian, Xiaoning and Yoon, Byung-Jun},
abstractNote = {We report that the accurate disease diagnosis and prognosis based on omics data rely on the effective identification of robust prognostic and diagnostic markers that reflect the states of the biological processes underlying the disease pathogenesis and progression. In this article, we present GCNCC, a Graph Convolutional Network-based approach for Clustering and Classification, that can identify highly effective and robust network-based disease markers. Based on a geometric deep learning framework, GCNCC learns deep network representations by integrating gene expression data with protein interaction data to identify highly reproducible markers with consistently accurate prediction performance across independent datasets possibly from different platforms. GCNCC identifies these markers by clustering the nodes in the protein interaction network based on latent similarity measures learned by the deep architecture of a graph convolutional network, followed by a supervised feature selection procedure that extracts clusters that are highly predictive of the disease state. By benchmarking GCNCC based on independent datasets from different diseases (psychiatric disorder and cancer) and different platforms (microarray and RNA-seq), we show that GCNCC outperforms other state-of-the-art methods in terms of accuracy and reproducibility.},
doi = {10.1093/bioinformatics/btab772},
journal = {Bioinformatics},
number = 4,
volume = 38,
place = {United States},
year = {Thu Nov 11 00:00:00 EST 2021},
month = {Thu Nov 11 00:00:00 EST 2021}
}
Works referenced in this record:
Tissue Classification with Gene Expression Profiles
journal, August 2000
- Ben-Dor, Amir; Bruhn, Laurakay; Friedman, Nir
- Journal of Computational Biology, Vol. 7, Issue 3-4
Editorial: Comorbidity and Autism Spectrum Disorder
journal, November 2020
- Casanova, Manuel F.; Frye, Richard E.; Gillberg, Christopher
- Frontiers in Psychiatry, Vol. 11
Chapter 5: Network Biology Approach to Complex Diseases
journal, December 2012
- Cho, Dong-Yeon; Kim, Yoo-Ah; Przytycka, Teresa M.
- PLoS Computational Biology, Vol. 8, Issue 12
Network‐based classification of breast cancer metastasis
journal, January 2007
- Chuang, Han‐Yu; Lee, Eunjung; Liu, Yu‐Tsueng
- Molecular Systems Biology, Vol. 3, Issue 1
A comparative analysis of biclustering algorithms for gene expression data
journal, July 2012
- Eren, K.; Deveci, M.; Kucuktunc, O.
- Briefings in Bioinformatics, Vol. 14, Issue 3
Clustering by Passing Messages Between Data Points
journal, February 2007
- Frey, B. J.; Dueck, D.
- Science, Vol. 315, Issue 5814
Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap
journal, February 2018
- Gandal, Michael J.; Haney, Jillian R.; Parikshak, Neelroop N.
- Science, Vol. 359, Issue 6376
A pathway-based classification of human breast cancer
journal, March 2010
- Gatza, M. L.; Lucas, J. E.; Barry, W. T.
- Proceedings of the National Academy of Sciences, Vol. 107, Issue 15
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring
journal, October 1999
- Golub, T. R.
- Science, Vol. 286, Issue 5439, p. 531-537
Genome-wide expression analysis reveals dyregulation of myelination-related genes in chronic schizophrenia
journal, May 2002
- Hakak, Y.; Walker, J. R.; Li, C.
- European Psychiatry, Vol. 17
Network-based stratification of tumor mutations
journal, September 2013
- Hofree, Matan; Shen, John P.; Carter, Hannah
- Nature Methods, Vol. 10, Issue 11
Identification of transcriptional regulators in the mouse immune system
journal, April 2013
- Jojic, Vladimir; Shay, Tal; Sylvia, Katelyn
- Nature Immunology, Vol. 14, Issue 6
Techniques for clustering gene expression data
journal, March 2008
- Kerr, G.; Ruskin, H. J.; Crane, M.
- Computers in Biology and Medicine, Vol. 38, Issue 3
Incorporating topological information for predicting robust cancer subnetwork markers in human protein-protein interaction network
journal, October 2016
- Khunlertgit, Navadon; Yoon, Byung-Jun
- BMC Bioinformatics, Vol. 17, Issue S13
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
journal, May 2016
- Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.
- Nucleic Acids Research, Vol. 44, Issue W1
Inferring Pathway Activity toward Precise Disease Classification
journal, November 2008
- Lee, Eunjung; Chuang, Han-Yu; Kim, Jong-Won
- PLoS Computational Biology, Vol. 4, Issue 11
Wisdom of crowds for robust gene network inference
journal, July 2012
- Marbach, Daniel; Costello, James C.; Küffner, Robert
- Nature Methods, Vol. 9, Issue 8
Modularity and community structure in networks
journal, May 2006
- Newman, M. E. J.
- Proceedings of the National Academy of Sciences, Vol. 103, Issue 23
DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants
journal, October 2016
- Piñero, Janet; Bravo, Àlex; Queralt-Rosinach, Núria
- Nucleic Acids Research, Vol. 45, Issue D1
A molecular signature of metastasis in primary solid tumors
journal, December 2002
- Ramaswamy, Sridhar; Ross, Ken N.; Lander, Eric S.
- Nature Genetics, Vol. 33, Issue 1
Integrating Genome-Wide Genetic Variations and Monocyte Expression Data Reveals Trans-Regulated Gene Modules in Humans
journal, December 2011
- Rotival, Maxime; Zeller, Tanja; Wild, Philipp S.
- PLoS Genetics, Vol. 7, Issue 12
Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks
journal, October 2013
- Roy, Sushmita; Lagree, Stephen; Hou, Zhonggang
- PLoS Computational Biology, Vol. 9, Issue 10
A comprehensive evaluation of module detection methods for gene expression data
journal, March 2018
- Saelens, Wouter; Cannoodt, Robrecht; Saeys, Yvan
- Nature Communications, Vol. 9, Issue 1
Advantages and limitations of current network inference methods
journal, August 2010
- De Smet, Riet; Marchal, Kathleen
- Nature Reviews Microbiology, Vol. 8, Issue 10
Genesis: cluster analysis of microarray data
journal, January 2002
- Sturn, A.; Quackenbush, J.; Trajanoski, Z.
- Bioinformatics, Vol. 18, Issue 1
Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity
journal, December 2009
- Su, Junjie; Yoon, Byung-Jun; Dougherty, Edward R.
- PLoS ONE, Vol. 4, Issue 12
Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network
journal, October 2010
- Su, Junjie; Yoon, Byung-Jun; Dougherty, Edward R.
- BMC Bioinformatics, Vol. 11, Issue S6
STRING v10: protein–protein interaction networks, integrated over the tree of life
journal, October 2014
- Szklarczyk, Damian; Franceschini, Andrea; Wyder, Stefan
- Nucleic Acids Research, Vol. 43, Issue D1
Autism as a Paradigmatic Complex Genetic Disorder
journal, September 2004
- Veenstra-VanderWeele, Jeremy; Christian, Susan L.; Cook, Jr., Edwin H.
- Annual Review of Genomics and Human Genetics, Vol. 5, Issue 1
Dynamic regulatory network controlling TH17 cell differentiation
journal, March 2013
- Yosef, Nir; Shalek, Alex K.; Gaublomme, Jellert T.
- Nature, Vol. 496, Issue 7446
A General Framework for Weighted Gene Co-Expression Network Analysis
journal, January 2005
- Zhang, Bin; Horvath, Steve
- Statistical Applications in Genetics and Molecular Biology, Vol. 4, Issue 1