skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants

Abstract

The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.

Authors:
 [1];  [2];  [3];  [1]
  1. Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, 100871, P.R. China
  2. Center for Data Science, Peking University, Beijing, 100871, P.R. China, Beijing Institute of Big Data Research, Beijing, 100871, P.R. China
  3. Center for Data Science, Peking University, Beijing, 100871, P.R. China, Beijing Institute of Big Data Research, Beijing, 100871, P.R. China, Department of Mathematics and PACM, Princeton University, Princeton, NJ, 08544, USA
Publication Date:
Research Org.:
Princeton Univ., NJ (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1430754
Alternate Identifier(s):
OSTI ID: 1502444
Grant/Contract Number:  
SC0009248
Resource Type:
Journal Article: Published Article
Journal Name:
Nucleic Acids Research
Additional Journal Information:
Journal Name: Nucleic Acids Research Journal Volume: 46 Journal Issue: 11; Journal ID: ISSN 0305-1048
Publisher:
Oxford University Press
Country of Publication:
United Kingdom
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Wang, Meng, Tai, Cheng, E, Weinan, and Wei, Liping. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. United Kingdom: N. p., 2018. Web. doi:10.1093/nar/gky215.
Wang, Meng, Tai, Cheng, E, Weinan, & Wei, Liping. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. United Kingdom. https://doi.org/10.1093/nar/gky215
Wang, Meng, Tai, Cheng, E, Weinan, and Wei, Liping. 2018. "DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants". United Kingdom. https://doi.org/10.1093/nar/gky215.
@article{osti_1430754,
title = {DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants},
author = {Wang, Meng and Tai, Cheng and E, Weinan and Wei, Liping},
abstractNote = {The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.},
doi = {10.1093/nar/gky215},
url = {https://www.osti.gov/biblio/1430754}, journal = {Nucleic Acids Research},
issn = {0305-1048},
number = 11,
volume = 46,
place = {United Kingdom},
year = {Mon Apr 02 00:00:00 EDT 2018},
month = {Mon Apr 02 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at https://doi.org/10.1093/nar/gky215

Citation Metrics:
Cited by: 52 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Chromatin connectivity maps reveal dynamic promoter–enhancer long-range associations
journal, November 2013


Deep learning
journal, May 2015


A census of human transcription factors: function, expression and evolution
journal, April 2009


Design and analysis of ChIP-seq experiments for DNA-binding proteins
journal, November 2008


Trimmomatic: a flexible trimmer for Illumina sequence data
journal, April 2014


Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
journal, January 2014


Differential Contributions of Rare and Common, Coding and Noncoding Ret Mutations to Multifactorial Hirschsprung Disease Liability
journal, July 2010


Functional annotation of noncoding sequence variants
journal, February 2014


Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
journal, May 2016


ChIP-seq accurately predicts tissue-specific activity of enhancers
journal, February 2009


Distribution and intensity of constraint in mammalian genomic sequence
journal, June 2005


The long-range interaction landscape of gene promoters
journal, September 2012


Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
journal, July 2015


The 14q22.2 colorectal cancer variant rs4444235 shows cis-acting regulation of BMP4
journal, December 2011


In pursuit of design principles of regulatory sequences
journal, June 2014


Genetic association studies: Design, analysis and interpretation
journal, January 2002


Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding
journal, April 2010


A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk
journal, April 2005


A method and server for predicting damaging missense mutations
journal, April 2010


Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++
journal, December 2010


Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data
journal, November 2013


An integrated encyclopedia of DNA elements in the human genome
journal, September 2012


Enhancer function: new insights into the regulation of tissue-specific gene expression
journal, March 2011


Genome-wide analysis of noncoding regulatory mutations in cancer
journal, September 2014


A general framework for estimating the relative pathogenicity of human genetic variants
journal, February 2014


Transcriptome and genome sequencing uncovers functional variation in humans
journal, September 2013


JASPAR: an open-access database for eukaryotic transcription factor binding profiles
journal, January 2004


Diversity and Complexity in DNA Recognition by Transcription Factors
journal, May 2009


Cis-regulatory mutations in human disease
journal, July 2009


Deep learning for computational biology
journal, July 2016


The role of regulatory variation in complex traits and disease
journal, February 2015


Role of non-coding sequence variants in cancer
journal, January 2016


Predicting effects of noncoding variants with deep learning–based sequence model
journal, August 2015


Quantifying similarity between motifs
journal, January 2007


The NHGRI GWAS Catalog, a curated resource of SNP-trait associations
journal, December 2013


Integrative analysis of 111 reference human epigenomes
journal, February 2015


A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping
journal, December 2014


Revealing the architecture of gene regulation: the promise of eQTL studies
journal, August 2008


Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement
journal, November 2014


Predicting Deleterious Amino Acid Substitutions
journal, May 2001


FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer
journal, October 2014


JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles
journal, November 2015


Deciphering the transcriptional cis-regulatory code
journal, January 2013


ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
journal, September 2012


Systematic Localization of Common Disease-Associated Variation in Regulatory DNA
journal, September 2012


Works referencing / citing this record:

Predicting functional variants in enhancer and promoter elements using RegulomeDB
journal, June 2019


Predicting functional variants in enhancer and promoter elements using RegulomeDB
journal, June 2019


Tap water fingerprinting using a convolutional neural network built from images of the coffee-ring effect
journal, January 2020


Representation learning of genomic sequence motifs with convolutional neural networks
journal, December 2019


Improved Prediction of Regulatory Element Using Hybrid Abelian Complexity Features with DNA Sequences
journal, April 2019