skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases

Abstract

Here, whole-exome sequencing (WES) has opened up previously unheard of possibilities for identifying novel disease genes in Mendelian disorders, only about half of which have been elucidated to date. However, interpretation of WES data remains challenging. As a result, we analyze protein–protein association (PPA) networks to identify candidate genes in the vicinity of genes previously implicated in a disease. The analysis, using a random-walk with restart (RWR) method, is adapted to the setting of WES by developing a composite variant-gene relevance score based on the rarity, location and predicted pathogenicity of variants and the RWR evaluation of genes harboring the variants. Benchmarking using known disease variants from 88 disease-gene families reveals that the correct gene is ranked among the top 10 candidates in ≥50% of cases, a figure which we confirmed using a prospective study of disease genes identified in 2012 and PPA data produced before that date. In conclusion, we implement our method in a freely available Web server, ExomeWalker, that displays a ranked list of candidates together with information on PPAs, frequency and predicted pathogenicity of the variants to allow quick and effective searches for candidates that are likely to reward closer investigation.

Authors:
 [1];  [2];  [3];  [4];  [4];  [4];  [5];  [6];  [7]
  1. The Wellcome Trust Sanger Institute, Cambridgeshire (United Kingdom)
  2. Charite-Univ. Berlin, Berlin (Germany)
  3. Univ. of Duisburg-Essen, Essen (Germany)
  4. Johns Hopkins Univ. School of Medicine, Baltimore, MD (United States)
  5. Charite-Univ. Berlin, Berlin (Germany); Freie Univ. Berlin, Berlin (Germany)
  6. Charite-Univ. Berlin, Berlin (Germany); Polish Academy of Sciences, Poznan (Poland)
  7. Charite-Univ. Berlin, Berlin (Germany); Freie Univ. Berlin, Berlin (Germany); Max Planck Institute for Molecular Genetics, Berlin (Germany)
Publication Date:
Research Org.:
Johns Hopkins Univ., Baltimore, MD (United States). School of Medicine
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
OSTI Identifier:
1342955
Grant/Contract Number:
AC02-05CH11231
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Bioinformatics
Additional Journal Information:
Journal Volume: 30; Journal Issue: 22; Journal ID: ISSN 1367-4803
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES

Citation Formats

Smedley, Damian, Kohler, Sebastian, Czeschik, Johanna Christina, Amberger, Joanna, Bocchini, Carol, Hamosh, Ada, Veldboer, Julian, Zemojtel, Tomasz, and Robinson, Peter N. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases. United States: N. p., 2014. Web. doi:10.1093/bioinformatics/btu508.
Smedley, Damian, Kohler, Sebastian, Czeschik, Johanna Christina, Amberger, Joanna, Bocchini, Carol, Hamosh, Ada, Veldboer, Julian, Zemojtel, Tomasz, & Robinson, Peter N. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases. United States. doi:10.1093/bioinformatics/btu508.
Smedley, Damian, Kohler, Sebastian, Czeschik, Johanna Christina, Amberger, Joanna, Bocchini, Carol, Hamosh, Ada, Veldboer, Julian, Zemojtel, Tomasz, and Robinson, Peter N. Wed . "Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases". United States. doi:10.1093/bioinformatics/btu508. https://www.osti.gov/servlets/purl/1342955.
@article{osti_1342955,
title = {Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases},
author = {Smedley, Damian and Kohler, Sebastian and Czeschik, Johanna Christina and Amberger, Joanna and Bocchini, Carol and Hamosh, Ada and Veldboer, Julian and Zemojtel, Tomasz and Robinson, Peter N.},
abstractNote = {Here, whole-exome sequencing (WES) has opened up previously unheard of possibilities for identifying novel disease genes in Mendelian disorders, only about half of which have been elucidated to date. However, interpretation of WES data remains challenging. As a result, we analyze protein–protein association (PPA) networks to identify candidate genes in the vicinity of genes previously implicated in a disease. The analysis, using a random-walk with restart (RWR) method, is adapted to the setting of WES by developing a composite variant-gene relevance score based on the rarity, location and predicted pathogenicity of variants and the RWR evaluation of genes harboring the variants. Benchmarking using known disease variants from 88 disease-gene families reveals that the correct gene is ranked among the top 10 candidates in ≥50% of cases, a figure which we confirmed using a prospective study of disease genes identified in 2012 and PPA data produced before that date. In conclusion, we implement our method in a freely available Web server, ExomeWalker, that displays a ranked list of candidates together with information on PPAs, frequency and predicted pathogenicity of the variants to allow quick and effective searches for candidates that are likely to reward closer investigation.},
doi = {10.1093/bioinformatics/btu508},
journal = {Bioinformatics},
number = 22,
volume = 30,
place = {United States},
year = {Wed Jul 30 00:00:00 EDT 2014},
month = {Wed Jul 30 00:00:00 EDT 2014}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 23works
Citation information provided by
Web of Science

Save / Share:
  • We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas. Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas. Among the newly identified cancer genes was PPP6C, encoding a serine/threonine phosphatase, which harbored mutations that clustered in the active site in 12% of sun-exposed melanomas, exclusively in tumors with mutations in BRAF or NRAS. Notably, we identified a recurrent UV-signature, an activating mutation in RAC1 in 9.2% of sun-exposed melanomas. This activating mutation, the third most frequentmore » in our cohort of sun-exposed melanoma after those of BRAF and NRAS, changes Pro29 to serine (RAC1{sup P29S}) in the highly conserved switch I domain. Crystal structures, and biochemical and functional studies of RAC1{sup P29S} showed that the alteration releases the conformational restraint conferred by the conserved proline, causes an increased binding of the protein to downstream effectors, and promotes melanocyte proliferation and migration. These findings raise the possibility that pharmacological inhibition of downstream effectors of RAC1 signaling could be of therapeutic benefit.« less
  • Our genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. These analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereasFGF3, FGF4, FGF19 or CCND1more » amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)–approved drugs. Finally, we identified risk factor–specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.« less
  • Cited by 4
  • Panicum virgatum L. (switchgrass) is a polyploid, perennial grass species that is native to North America, and is being developed as a future biofuel feedstock crop. Switchgrass is present primarily in two ecotypes: a northern upland ecotype, composed of tetraploid and octoploid accessions, and a southern lowland ecotype, composed of primarily tetraploid accessions. We employed high-coverage exome capture sequencing (~2.4 Tb) to genotype 537 individuals from 45 upland and 21 lowland populations. From these data, we identified ~27 million single-nucleotide polymorphisms (SNPs), of which 1 590 653 high-confidence SNPs were used in downstream analyses of diversity within and between themore » populations. From the 66 populations, we identified five primary population groups within the upland and lowland ecotypes, a result that was further supported through genetic distance analysis. We identified conserved, ecotype-restricted, non-synonymous SNPs that are predicted to affect the protein function of CONSTANS (CO) and EARLY HEADING DATE 1 (EHD1), key genes involved in flowering, which may contribute to the phenotypic differences between the two ecotypes. We also identified, relative to the near-reference Kanlow population, 17 228 genes present in more copies than in the reference genome (up-CNVs), 112 630 genes present in fewer copies than in the reference genome (down-CNVs) and 14 430 presence/absence variants (PAVs), affecting a total of 9979 genes, including two upland-specific CNV clusters. In total, 45 719 genes were affected by an SNP, CNV, or PAV across the panel, providing a firm foundation to identify functional variation associated with phenotypic traits of interest for biofuel feedstock production.« less
  • Isolation of DNA segments adjacent to known sequences is a tedious task in genome-related research. We have developed an efficient PCR strategy that overcomes the shortcomings of existing methods and can be automated. This strategy, thermal asymmetric interlaced (TAIL)-PCR, utilizes nested sequence-specific primers together with a shorter arbitrary degenerate primer so that the relative amplification efficiencies of specific and nonspecific products can be thermally controlled. One low-stringency PCR cycle is carried out to create annealing site(s) adapted for the arbitrary primer within the unknown target sequence bordering the known segment. This sequence is then preferentially and geometrically amplified over nontargetmore » ones by interspersion of high-stringency PCR cycles with reduced-stringency PCR cycles. We have exploited the efficiency of this method to expedite amplification and sequencing of insert end segments from P1 and YAC clones for chromosome walking. In this study we present protocols that are amenable to automation of amplification and sequencing of insert end sequences directly from cells of P1 and YAC clones. 19 refs., 7 figs., 1 tab.« less