skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants

Abstract

The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.

Authors:
 [1];  [2];  [3];  [1]
  1. Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, 100871, P.R. China
  2. Center for Data Science, Peking University, Beijing, 100871, P.R. China, Beijing Institute of Big Data Research, Beijing, 100871, P.R. China
  3. Center for Data Science, Peking University, Beijing, 100871, P.R. China, Beijing Institute of Big Data Research, Beijing, 100871, P.R. China, Department of Mathematics and PACM, Princeton University, Princeton, NJ, 08544, USA
Publication Date:
Research Org.:
Princeton Univ., NJ (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1430754
Alternate Identifier(s):
OSTI ID: 1502444
Grant/Contract Number:  
SC0009248
Resource Type:
Published Article
Journal Name:
Nucleic Acids Research
Additional Journal Information:
Journal Name: Nucleic Acids Research Journal Volume: 46 Journal Issue: 11; Journal ID: ISSN 0305-1048
Publisher:
Oxford University Press
Country of Publication:
United Kingdom
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Wang, Meng, Tai, Cheng, E, Weinan, and Wei, Liping. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. United Kingdom: N. p., 2018. Web. doi:10.1093/nar/gky215.
Wang, Meng, Tai, Cheng, E, Weinan, & Wei, Liping. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. United Kingdom. doi:10.1093/nar/gky215.
Wang, Meng, Tai, Cheng, E, Weinan, and Wei, Liping. Mon . "DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants". United Kingdom. doi:10.1093/nar/gky215.
@article{osti_1430754,
title = {DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants},
author = {Wang, Meng and Tai, Cheng and E, Weinan and Wei, Liping},
abstractNote = {The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.},
doi = {10.1093/nar/gky215},
journal = {Nucleic Acids Research},
number = 11,
volume = 46,
place = {United Kingdom},
year = {2018},
month = {4}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
DOI: 10.1093/nar/gky215

Citation Metrics:
Cited by: 7 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Chromatin connectivity maps reveal dynamic promoter–enhancer long-range associations
journal, November 2013

  • Zhang, Yubo; Wong, Chee-Hong; Birnbaum, Ramon Y.
  • Nature, Vol. 504, Issue 7479
  • DOI: 10.1038/nature12716

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
journal, August 2005


Deep learning
journal, May 2015

  • LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
  • Nature, Vol. 521, Issue 7553
  • DOI: 10.1038/nature14539

A census of human transcription factors: function, expression and evolution
journal, April 2009

  • Vaquerizas, Juan M.; Kummerfeld, Sarah K.; Teichmann, Sarah A.
  • Nature Reviews Genetics, Vol. 10, Issue 4
  • DOI: 10.1038/nrg2538

ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions
journal, October 2012

  • Furey, Terrence S.
  • Nature Reviews Genetics, Vol. 13, Issue 12
  • DOI: 10.1038/nrg3306

Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors
journal, September 2012


Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009


Design and analysis of ChIP-seq experiments for DNA-binding proteins
journal, November 2008

  • Kharchenko, Peter V.; Tolstorukov, Michael Y.; Park, Peter J.
  • Nature Biotechnology, Vol. 26, Issue 12
  • DOI: 10.1038/nbt.1508

Trimmomatic: a flexible trimmer for Illumina sequence data
journal, April 2014


Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
journal, January 2014

  • Worsley Hunt, Rebecca; Mathelier, Anthony; del Peso, Luis
  • BMC Genomics, Vol. 15, Issue 1
  • DOI: 10.1186/1471-2164-15-472

Differential Contributions of Rare and Common, Coding and Noncoding Ret Mutations to Multifactorial Hirschsprung Disease Liability
journal, July 2010

  • Emison, Eileen Sproat; Garcia-Barcelo, Merce; Grice, Elizabeth A.
  • The American Journal of Human Genetics, Vol. 87, Issue 1
  • DOI: 10.1016/j.ajhg.2010.06.007

Functional annotation of noncoding sequence variants
journal, February 2014

  • Ritchie, Graham R. S.; Dunham, Ian; Zeggini, Eleftheria
  • Nature Methods, Vol. 11, Issue 3
  • DOI: 10.1038/nmeth.2832

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
journal, May 2016

  • Kelley, David R.; Snoek, Jasper; Rinn, John L.
  • Genome Research, Vol. 26, Issue 7
  • DOI: 10.1101/gr.200535.115

ChIP-seq accurately predicts tissue-specific activity of enhancers
journal, February 2009

  • Visel, Axel; Blow, Matthew J.; Li, Zirong
  • Nature, Vol. 457, Issue 7231
  • DOI: 10.1038/nature07730

Distribution and intensity of constraint in mammalian genomic sequence
journal, June 2005


The long-range interaction landscape of gene promoters
journal, September 2012

  • Sanyal, Amartya; Lajoie, Bryan R.; Jain, Gaurav
  • Nature, Vol. 489, Issue 7414
  • DOI: 10.1038/nature11279

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
journal, July 2015

  • Alipanahi, Babak; Delong, Andrew; Weirauch, Matthew T.
  • Nature Biotechnology, Vol. 33, Issue 8
  • DOI: 10.1038/nbt.3300

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences
journal, April 2016

  • Quang, Daniel; Xie, Xiaohui
  • Nucleic Acids Research, Vol. 44, Issue 11
  • DOI: 10.1093/nar/gkw226

The 14q22.2 colorectal cancer variant rs4444235 shows cis-acting regulation of BMP4
journal, December 2011


In pursuit of design principles of regulatory sequences
journal, June 2014

  • Levo, Michal; Segal, Eran
  • Nature Reviews Genetics, Vol. 15, Issue 7
  • DOI: 10.1038/nrg3684

Genetic association studies: Design, analysis and interpretation
journal, January 2002


Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding
journal, April 2010


A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk
journal, April 2005

  • Emison, Eileen Sproat; McCallion, Andrew S.; Kashuk, Carl S.
  • Nature, Vol. 434, Issue 7035
  • DOI: 10.1038/nature03467

A method and server for predicting damaging missense mutations
journal, April 2010

  • Adzhubei, Ivan A.; Schmidt, Steffen; Peshkin, Leonid
  • Nature Methods, Vol. 7, Issue 4
  • DOI: 10.1038/nmeth0410-248

Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++
journal, December 2010


Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data
journal, November 2013


Enhancer function: new insights into the regulation of tissue-specific gene expression
journal, March 2011

  • Ong, Chin-Tong; Corces, Victor G.
  • Nature Reviews Genetics, Vol. 12, Issue 4
  • DOI: 10.1038/nrg2957

Genome-wide analysis of noncoding regulatory mutations in cancer
journal, September 2014

  • Weinhold, Nils; Jacobsen, Anders; Schultz, Nikolaus
  • Nature Genetics, Vol. 46, Issue 11
  • DOI: 10.1038/ng.3101

A general framework for estimating the relative pathogenicity of human genetic variants
journal, February 2014

  • Kircher, Martin; Witten, Daniela M.; Jain, Preti
  • Nature Genetics, Vol. 46, Issue 3
  • DOI: 10.1038/ng.2892

Transcriptome and genome sequencing uncovers functional variation in humans
journal, September 2013

  • Lappalainen, Tuuli; Sammeth, Michael; Friedländer, Marc R.
  • Nature, Vol. 501, Issue 7468
  • DOI: 10.1038/nature12531

JASPAR: an open-access database for eukaryotic transcription factor binding profiles
journal, January 2004


Diversity and Complexity in DNA Recognition by Transcription Factors
journal, May 2009


Cis-regulatory mutations in human disease
journal, July 2009

  • Epstein, D. J.
  • Briefings in Functional Genomics and Proteomics, Vol. 8, Issue 4
  • DOI: 10.1093/bfgp/elp021

Deep learning for computational biology
journal, July 2016

  • Angermueller, Christof; Pärnamaa, Tanel; Parts, Leopold
  • Molecular Systems Biology, Vol. 12, Issue 7
  • DOI: 10.15252/msb.20156651

The role of regulatory variation in complex traits and disease
journal, February 2015

  • Albert, Frank W.; Kruglyak, Leonid
  • Nature Reviews Genetics, Vol. 16, Issue 4
  • DOI: 10.1038/nrg3891

Role of non-coding sequence variants in cancer
journal, January 2016

  • Khurana, Ekta; Fu, Yao; Chakravarty, Dimple
  • Nature Reviews Genetics, Vol. 17, Issue 2
  • DOI: 10.1038/nrg.2015.17

Predicting effects of noncoding variants with deep learning–based sequence model
journal, August 2015

  • Zhou, Jian; Troyanskaya, Olga G.
  • Nature Methods, Vol. 12, Issue 10
  • DOI: 10.1038/nmeth.3547

Quantifying similarity between motifs
journal, January 2007

  • Gupta, Shobhit; Stamatoyannopoulos, John A.; Bailey, Timothy L.
  • Genome Biology, Vol. 8, Issue 2
  • DOI: 10.1186/gb-2007-8-2-r24

The NHGRI GWAS Catalog, a curated resource of SNP-trait associations
journal, December 2013

  • Welter, Danielle; MacArthur, Jacqueline; Morales, Joannella
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1229

Integrative analysis of 111 reference human epigenomes
journal, February 2015

  • Kundaje, Anshul; Meuleman, Wouter; Ernst, Jason
  • Nature, Vol. 518, Issue 7539
  • DOI: 10.1038/nature14248

A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping
journal, December 2014


Revealing the architecture of gene regulation: the promise of eQTL studies
journal, August 2008


Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement
journal, November 2014


Predicting Deleterious Amino Acid Substitutions
journal, May 2001


FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer
journal, October 2014


JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles
journal, November 2015

  • Mathelier, Anthony; Fornes, Oriol; Arenillas, David J.
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1176

Deciphering the transcriptional cis-regulatory code
journal, January 2013

  • Yáñez-Cuna, J. Omar; Kvon, Evgeny Z.; Stark, Alexander
  • Trends in Genetics, Vol. 29, Issue 1
  • DOI: 10.1016/j.tig.2012.09.007

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
journal, September 2012


Systematic Localization of Common Disease-Associated Variation in Regulatory DNA
journal, September 2012


    Works referencing / citing this record:

    Predicting functional variants in enhancer and promoter elements using RegulomeDB
    journal, June 2019

    • Dong, Shengcheng; Boyle, Alan P.
    • Human Mutation, Vol. 40, Issue 9
    • DOI: 10.1002/humu.23791

    Predicting functional variants in enhancer and promoter elements using RegulomeDB
    journal, June 2019

    • Dong, Shengcheng; Boyle, Alan P.
    • Human Mutation, Vol. 40, Issue 9
    • DOI: 10.1002/humu.23791

    Tap water fingerprinting using a convolutional neural network built from images of the coffee-ring effect
    journal, January 2020

    • Li, Xiaoyan; Sanderson, Alyssa R.; Allen, Selett S.
    • The Analyst, Vol. 145, Issue 4
    • DOI: 10.1039/c9an01624d

    Mechanistic interpretation of non-coding variants for discovering transcriptional regulators of drug response
    journal, July 2019


    Representation learning of genomic sequence motifs with convolutional neural networks
    journal, December 2019


    Improved Prediction of Regulatory Element Using Hybrid Abelian Complexity Features with DNA Sequences
    journal, April 2019

    • Wu, Chengchao; Chen, Jin; Liu, Yunxia
    • International Journal of Molecular Sciences, Vol. 20, Issue 7
    • DOI: 10.3390/ijms20071704