DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Deep graph representations embed network information for robust disease marker identification

Abstract

We report that the accurate disease diagnosis and prognosis based on omics data rely on the effective identification of robust prognostic and diagnostic markers that reflect the states of the biological processes underlying the disease pathogenesis and progression. In this article, we present GCNCC, a Graph Convolutional Network-based approach for Clustering and Classification, that can identify highly effective and robust network-based disease markers. Based on a geometric deep learning framework, GCNCC learns deep network representations by integrating gene expression data with protein interaction data to identify highly reproducible markers with consistently accurate prediction performance across independent datasets possibly from different platforms. GCNCC identifies these markers by clustering the nodes in the protein interaction network based on latent similarity measures learned by the deep architecture of a graph convolutional network, followed by a supervised feature selection procedure that extracts clusters that are highly predictive of the disease state. By benchmarking GCNCC based on independent datasets from different diseases (psychiatric disorder and cancer) and different platforms (microarray and RNA-seq), we show that GCNCC outperforms other state-of-the-art methods in terms of accuracy and reproducibility.

Authors:
ORCiD logo [1];  [2]; ORCiD logo [2]
  1. Texas A & M Univ., College Station, TX (United States)
  2. Texas A & M Univ., College Station, TX (United States); Brookhaven National Lab. (BNL), Upton, NY (United States)
Publication Date:
Research Org.:
Brookhaven National Lab. (BNL), Upton, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1855099
Report Number(s):
BNL-222809-2022-JAAM
Journal ID: ISSN 1367-4803
Grant/Contract Number:  
SC0012704
Resource Type:
Accepted Manuscript
Journal Name:
Bioinformatics
Additional Journal Information:
Journal Volume: 38; Journal Issue: 4; Journal ID: ISSN 1367-4803
Publisher:
International Society for Computational Biology - Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Maddouri, Omar, Qian, Xiaoning, and Yoon, Byung-Jun. Deep graph representations embed network information for robust disease marker identification. United States: N. p., 2021. Web. doi:10.1093/bioinformatics/btab772.
Maddouri, Omar, Qian, Xiaoning, & Yoon, Byung-Jun. Deep graph representations embed network information for robust disease marker identification. United States. https://doi.org/10.1093/bioinformatics/btab772
Maddouri, Omar, Qian, Xiaoning, and Yoon, Byung-Jun. Thu . "Deep graph representations embed network information for robust disease marker identification". United States. https://doi.org/10.1093/bioinformatics/btab772. https://www.osti.gov/servlets/purl/1855099.
@article{osti_1855099,
title = {Deep graph representations embed network information for robust disease marker identification},
author = {Maddouri, Omar and Qian, Xiaoning and Yoon, Byung-Jun},
abstractNote = {We report that the accurate disease diagnosis and prognosis based on omics data rely on the effective identification of robust prognostic and diagnostic markers that reflect the states of the biological processes underlying the disease pathogenesis and progression. In this article, we present GCNCC, a Graph Convolutional Network-based approach for Clustering and Classification, that can identify highly effective and robust network-based disease markers. Based on a geometric deep learning framework, GCNCC learns deep network representations by integrating gene expression data with protein interaction data to identify highly reproducible markers with consistently accurate prediction performance across independent datasets possibly from different platforms. GCNCC identifies these markers by clustering the nodes in the protein interaction network based on latent similarity measures learned by the deep architecture of a graph convolutional network, followed by a supervised feature selection procedure that extracts clusters that are highly predictive of the disease state. By benchmarking GCNCC based on independent datasets from different diseases (psychiatric disorder and cancer) and different platforms (microarray and RNA-seq), we show that GCNCC outperforms other state-of-the-art methods in terms of accuracy and reproducibility.},
doi = {10.1093/bioinformatics/btab772},
journal = {Bioinformatics},
number = 4,
volume = 38,
place = {United States},
year = {Thu Nov 11 00:00:00 EST 2021},
month = {Thu Nov 11 00:00:00 EST 2021}
}

Works referenced in this record:

Tissue Classification with Gene Expression Profiles
journal, August 2000

  • Ben-Dor, Amir; Bruhn, Laurakay; Friedman, Nir
  • Journal of Computational Biology, Vol. 7, Issue 3-4
  • DOI: 10.1089/106652700750050943

Editorial: Comorbidity and Autism Spectrum Disorder
journal, November 2020

  • Casanova, Manuel F.; Frye, Richard E.; Gillberg, Christopher
  • Frontiers in Psychiatry, Vol. 11
  • DOI: 10.3389/fpsyt.2020.617395

Chapter 5: Network Biology Approach to Complex Diseases
journal, December 2012


Network‐based classification of breast cancer metastasis
journal, January 2007

  • Chuang, Han‐Yu; Lee, Eunjung; Liu, Yu‐Tsueng
  • Molecular Systems Biology, Vol. 3, Issue 1
  • DOI: 10.1038/msb4100180

A comparative analysis of biclustering algorithms for gene expression data
journal, July 2012

  • Eren, K.; Deveci, M.; Kucuktunc, O.
  • Briefings in Bioinformatics, Vol. 14, Issue 3
  • DOI: 10.1093/bib/bbs032

Clustering by Passing Messages Between Data Points
journal, February 2007


Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap
journal, February 2018

  • Gandal, Michael J.; Haney, Jillian R.; Parikshak, Neelroop N.
  • Science, Vol. 359, Issue 6376
  • DOI: 10.1126/science.aad6469

A pathway-based classification of human breast cancer
journal, March 2010

  • Gatza, M. L.; Lucas, J. E.; Barry, W. T.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 15
  • DOI: 10.1073/pnas.0912708107

Network-based stratification of tumor mutations
journal, September 2013

  • Hofree, Matan; Shen, John P.; Carter, Hannah
  • Nature Methods, Vol. 10, Issue 11
  • DOI: 10.1038/nmeth.2651

Identification of transcriptional regulators in the mouse immune system
journal, April 2013

  • Jojic, Vladimir; Shay, Tal; Sylvia, Katelyn
  • Nature Immunology, Vol. 14, Issue 6
  • DOI: 10.1038/ni.2587

Techniques for clustering gene expression data
journal, March 2008


Incorporating topological information for predicting robust cancer subnetwork markers in human protein-protein interaction network
journal, October 2016


Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
journal, May 2016

  • Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.
  • Nucleic Acids Research, Vol. 44, Issue W1
  • DOI: 10.1093/nar/gkw377

Inferring Pathway Activity toward Precise Disease Classification
journal, November 2008


Wisdom of crowds for robust gene network inference
journal, July 2012

  • Marbach, Daniel; Costello, James C.; Küffner, Robert
  • Nature Methods, Vol. 9, Issue 8
  • DOI: 10.1038/nmeth.2016

Modularity and community structure in networks
journal, May 2006

  • Newman, M. E. J.
  • Proceedings of the National Academy of Sciences, Vol. 103, Issue 23
  • DOI: 10.1073/pnas.0601602103

DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants
journal, October 2016

  • Piñero, Janet; Bravo, Àlex; Queralt-Rosinach, Núria
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw943

A molecular signature of metastasis in primary solid tumors
journal, December 2002

  • Ramaswamy, Sridhar; Ross, Ken N.; Lander, Eric S.
  • Nature Genetics, Vol. 33, Issue 1
  • DOI: 10.1038/ng1060

Integrating Genome-Wide Genetic Variations and Monocyte Expression Data Reveals Trans-Regulated Gene Modules in Humans
journal, December 2011


Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks
journal, October 2013


A comprehensive evaluation of module detection methods for gene expression data
journal, March 2018


Advantages and limitations of current network inference methods
journal, August 2010

  • De Smet, Riet; Marchal, Kathleen
  • Nature Reviews Microbiology, Vol. 8, Issue 10
  • DOI: 10.1038/nrmicro2419

Genesis: cluster analysis of microarray data
journal, January 2002


Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity
journal, December 2009


Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network
journal, October 2010


STRING v10: protein–protein interaction networks, integrated over the tree of life
journal, October 2014

  • Szklarczyk, Damian; Franceschini, Andrea; Wyder, Stefan
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku1003

Autism as a Paradigmatic Complex Genetic Disorder
journal, September 2004


Dynamic regulatory network controlling TH17 cell differentiation
journal, March 2013

  • Yosef, Nir; Shalek, Alex K.; Gaublomme, Jellert T.
  • Nature, Vol. 496, Issue 7446
  • DOI: 10.1038/nature11981

A General Framework for Weighted Gene Co-Expression Network Analysis
journal, January 2005

  • Zhang, Bin; Horvath, Steve
  • Statistical Applications in Genetics and Molecular Biology, Vol. 4, Issue 1
  • DOI: 10.2202/1544-6115.1128