DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants
Abstract
The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.
- Authors:
-
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, 100871, P.R. China
- Center for Data Science, Peking University, Beijing, 100871, P.R. China, Beijing Institute of Big Data Research, Beijing, 100871, P.R. China
- Center for Data Science, Peking University, Beijing, 100871, P.R. China, Beijing Institute of Big Data Research, Beijing, 100871, P.R. China, Department of Mathematics and PACM, Princeton University, Princeton, NJ, 08544, USA
- Publication Date:
- Research Org.:
- Princeton Univ., NJ (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- OSTI Identifier:
- 1430754
- Alternate Identifier(s):
- OSTI ID: 1502444
- Grant/Contract Number:
- SC0009248
- Resource Type:
- Journal Article: Published Article
- Journal Name:
- Nucleic Acids Research
- Additional Journal Information:
- Journal Name: Nucleic Acids Research Journal Volume: 46 Journal Issue: 11; Journal ID: ISSN 0305-1048
- Publisher:
- Oxford University Press
- Country of Publication:
- United Kingdom
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Wang, Meng, Tai, Cheng, E, Weinan, and Wei, Liping. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. United Kingdom: N. p., 2018.
Web. doi:10.1093/nar/gky215.
Wang, Meng, Tai, Cheng, E, Weinan, & Wei, Liping. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. United Kingdom. https://doi.org/10.1093/nar/gky215
Wang, Meng, Tai, Cheng, E, Weinan, and Wei, Liping. 2018.
"DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants". United Kingdom. https://doi.org/10.1093/nar/gky215.
@article{osti_1430754,
title = {DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants},
author = {Wang, Meng and Tai, Cheng and E, Weinan and Wei, Liping},
abstractNote = {The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.},
doi = {10.1093/nar/gky215},
url = {https://www.osti.gov/biblio/1430754},
journal = {Nucleic Acids Research},
issn = {0305-1048},
number = 11,
volume = 46,
place = {United Kingdom},
year = {Mon Apr 02 00:00:00 EDT 2018},
month = {Mon Apr 02 00:00:00 EDT 2018}
}
Web of Science
Works referenced in this record:
Chromatin connectivity maps reveal dynamic promoter–enhancer long-range associations
journal, November 2013
- Zhang, Yubo; Wong, Chee-Hong; Birnbaum, Ramon Y.
- Nature, Vol. 504, Issue 7479
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
journal, August 2005
- Siepel, A.
- Genome Research, Vol. 15, Issue 8
Deep learning
journal, May 2015
- LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
- Nature, Vol. 521, Issue 7553
A census of human transcription factors: function, expression and evolution
journal, April 2009
- Vaquerizas, Juan M.; Kummerfeld, Sarah K.; Teichmann, Sarah A.
- Nature Reviews Genetics, Vol. 10, Issue 4
ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions
journal, October 2012
- Furey, Terrence S.
- Nature Reviews Genetics, Vol. 13, Issue 12
Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors
journal, September 2012
- Wang, J.; Zhuang, J.; Iyer, S.
- Genome Research, Vol. 22, Issue 9
Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009
- Li, H.; Durbin, R.
- Bioinformatics, Vol. 25, Issue 14
The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine
journal, September 2013
- Stenson, Peter D.; Mort, Matthew; Ball, Edward V.
- Human Genetics, Vol. 133, Issue 1
Design and analysis of ChIP-seq experiments for DNA-binding proteins
journal, November 2008
- Kharchenko, Peter V.; Tolstorukov, Michael Y.; Park, Peter J.
- Nature Biotechnology, Vol. 26, Issue 12
Trimmomatic: a flexible trimmer for Illumina sequence data
journal, April 2014
- Bolger, Anthony M.; Lohse, Marc; Usadel, Bjoern
- Bioinformatics, Vol. 30, Issue 15
Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
journal, January 2014
- Worsley Hunt, Rebecca; Mathelier, Anthony; del Peso, Luis
- BMC Genomics, Vol. 15, Issue 1
Differential Contributions of Rare and Common, Coding and Noncoding Ret Mutations to Multifactorial Hirschsprung Disease Liability
journal, July 2010
- Emison, Eileen Sproat; Garcia-Barcelo, Merce; Grice, Elizabeth A.
- The American Journal of Human Genetics, Vol. 87, Issue 1
Functional annotation of noncoding sequence variants
journal, February 2014
- Ritchie, Graham R. S.; Dunham, Ian; Zeggini, Eleftheria
- Nature Methods, Vol. 11, Issue 3
Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
journal, May 2016
- Kelley, David R.; Snoek, Jasper; Rinn, John L.
- Genome Research, Vol. 26, Issue 7
ChIP-seq accurately predicts tissue-specific activity of enhancers
journal, February 2009
- Visel, Axel; Blow, Matthew J.; Li, Zirong
- Nature, Vol. 457, Issue 7231
Distribution and intensity of constraint in mammalian genomic sequence
journal, June 2005
- Cooper, G. M.
- Genome Research, Vol. 15, Issue 7
The long-range interaction landscape of gene promoters
journal, September 2012
- Sanyal, Amartya; Lajoie, Bryan R.; Jain, Gaurav
- Nature, Vol. 489, Issue 7414
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
journal, July 2015
- Alipanahi, Babak; Delong, Andrew; Weirauch, Matthew T.
- Nature Biotechnology, Vol. 33, Issue 8
DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences
journal, April 2016
- Quang, Daniel; Xie, Xiaohui
- Nucleic Acids Research, Vol. 44, Issue 11
The 14q22.2 colorectal cancer variant rs4444235 shows cis-acting regulation of BMP4
journal, December 2011
- Lubbe, S. J.; Pittman, A. M.; Olver, B.
- Oncogene, Vol. 31, Issue 33
In pursuit of design principles of regulatory sequences
journal, June 2014
- Levo, Michal; Segal, Eran
- Nature Reviews Genetics, Vol. 15, Issue 7
Genetic association studies: Design, analysis and interpretation
journal, January 2002
- Lewis, C. M.
- Briefings in Bioinformatics, Vol. 3, Issue 2
Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding
journal, April 2010
- Schmidt, D.; Wilson, M. D.; Ballester, B.
- Science, Vol. 328, Issue 5981
A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk
journal, April 2005
- Emison, Eileen Sproat; McCallion, Andrew S.; Kashuk, Carl S.
- Nature, Vol. 434, Issue 7035
A method and server for predicting damaging missense mutations
journal, April 2010
- Adzhubei, Ivan A.; Schmidt, Steffen; Peshkin, Leonid
- Nature Methods, Vol. 7, Issue 4
Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++
journal, December 2010
- Davydov, Eugene V.; Goode, David L.; Sirota, Marina
- PLoS Computational Biology, Vol. 6, Issue 12
Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data
journal, November 2013
- Bailey, Timothy; Krajewski, Pawel; Ladunga, Istvan
- PLoS Computational Biology, Vol. 9, Issue 11
An integrated encyclopedia of DNA elements in the human genome
journal, September 2012
- ,
- Nature, Vol. 489, Issue 7414, p. 57-74
Enhancer function: new insights into the regulation of tissue-specific gene expression
journal, March 2011
- Ong, Chin-Tong; Corces, Victor G.
- Nature Reviews Genetics, Vol. 12, Issue 4
Genome-wide analysis of noncoding regulatory mutations in cancer
journal, September 2014
- Weinhold, Nils; Jacobsen, Anders; Schultz, Nikolaus
- Nature Genetics, Vol. 46, Issue 11
A general framework for estimating the relative pathogenicity of human genetic variants
journal, February 2014
- Kircher, Martin; Witten, Daniela M.; Jain, Preti
- Nature Genetics, Vol. 46, Issue 3
Transcriptome and genome sequencing uncovers functional variation in humans
journal, September 2013
- Lappalainen, Tuuli; Sammeth, Michael; Friedländer, Marc R.
- Nature, Vol. 501, Issue 7468
JASPAR: an open-access database for eukaryotic transcription factor binding profiles
journal, January 2004
- Sandelin, A.
- Nucleic Acids Research, Vol. 32, Issue 90001
Diversity and Complexity in DNA Recognition by Transcription Factors
journal, May 2009
- Badis, G.; Berger, M. F.; Philippakis, A. A.
- Science, Vol. 324, Issue 5935
Cis-regulatory mutations in human disease
journal, July 2009
- Epstein, D. J.
- Briefings in Functional Genomics and Proteomics, Vol. 8, Issue 4
Deep learning for computational biology
journal, July 2016
- Angermueller, Christof; Pärnamaa, Tanel; Parts, Leopold
- Molecular Systems Biology, Vol. 12, Issue 7
The role of regulatory variation in complex traits and disease
journal, February 2015
- Albert, Frank W.; Kruglyak, Leonid
- Nature Reviews Genetics, Vol. 16, Issue 4
Role of non-coding sequence variants in cancer
journal, January 2016
- Khurana, Ekta; Fu, Yao; Chakravarty, Dimple
- Nature Reviews Genetics, Vol. 17, Issue 2
Predicting effects of noncoding variants with deep learning–based sequence model
journal, August 2015
- Zhou, Jian; Troyanskaya, Olga G.
- Nature Methods, Vol. 12, Issue 10
Quantifying similarity between motifs
journal, January 2007
- Gupta, Shobhit; Stamatoyannopoulos, John A.; Bailey, Timothy L.
- Genome Biology, Vol. 8, Issue 2
The NHGRI GWAS Catalog, a curated resource of SNP-trait associations
journal, December 2013
- Welter, Danielle; MacArthur, Jacqueline; Morales, Joannella
- Nucleic Acids Research, Vol. 42, Issue D1
Integrative analysis of 111 reference human epigenomes
journal, February 2015
- Kundaje, Anshul; Meuleman, Wouter; Ernst, Jason
- Nature, Vol. 518, Issue 7539
iFish: predicting the pathogenicity of human nonsynonymous variants using gene-specific/family-specific attributes and classifiers
journal, August 2016
- Wang, Meng; Wei, Liping
- Scientific Reports, Vol. 6, Issue 1
A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping
journal, December 2014
- Rao, Suhas S. P.; Huntley, Miriam H.; Durand, Neva C.
- Cell, Vol. 159, Issue 7
Revealing the architecture of gene regulation: the promise of eQTL studies
journal, August 2008
- Gilad, Yoav; Rifkin, Scott A.; Pritchard, Jonathan K.
- Trends in Genetics, Vol. 24, Issue 8
Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement
journal, November 2014
- Walker, Bruce J.; Abeel, Thomas; Shea, Terrance
- PLoS ONE, Vol. 9, Issue 11
Predicting Deleterious Amino Acid Substitutions
journal, May 2001
- Ng, P. C.; Henikoff, S.
- Genome Research, Vol. 11, Issue 5
FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer
journal, October 2014
- Fu, Yao; Liu, Zhu; Lou, Shaoke
- Genome Biology, Vol. 15, Issue 10
JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles
journal, November 2015
- Mathelier, Anthony; Fornes, Oriol; Arenillas, David J.
- Nucleic Acids Research, Vol. 44, Issue D1
Deciphering the transcriptional cis-regulatory code
journal, January 2013
- Yáñez-Cuna, J. Omar; Kvon, Evgeny Z.; Stark, Alexander
- Trends in Genetics, Vol. 29, Issue 1
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
journal, September 2012
- Landt, S. G.; Marinov, G. K.; Kundaje, A.
- Genome Research, Vol. 22, Issue 9
Systematic Localization of Common Disease-Associated Variation in Regulatory DNA
journal, September 2012
- Maurano, M. T.; Humbert, R.; Rynes, E.
- Science, Vol. 337, Issue 6099
Works referencing / citing this record:
Predicting functional variants in enhancer and promoter elements using RegulomeDB
journal, June 2019
- Dong, Shengcheng; Boyle, Alan P.
- Human Mutation, Vol. 40, Issue 9
Predicting functional variants in enhancer and promoter elements using RegulomeDB
journal, June 2019
- Dong, Shengcheng; Boyle, Alan P.
- Human Mutation, Vol. 40, Issue 9
Tap water fingerprinting using a convolutional neural network built from images of the coffee-ring effect
journal, January 2020
- Li, Xiaoyan; Sanderson, Alyssa R.; Allen, Selett S.
- The Analyst, Vol. 145, Issue 4
DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence
journal, November 2018
- Xue, Li; Tang, Bin; Chen, Wei
- Bioinformatics, Vol. 35, Issue 12
Mechanistic interpretation of non-coding variants for discovering transcriptional regulators of drug response
journal, July 2019
- Xie, Xiaoman; Hanson, Casey; Sinha, Saurabh
- BMC Biology, Vol. 17, Issue 1
Representation learning of genomic sequence motifs with convolutional neural networks
journal, December 2019
- Koo, Peter K.; Eddy, Sean R.
- PLOS Computational Biology, Vol. 15, Issue 12
Improved Prediction of Regulatory Element Using Hybrid Abelian Complexity Features with DNA Sequences
journal, April 2019
- Wu, Chengchao; Chen, Jin; Liu, Yunxia
- International Journal of Molecular Sciences, Vol. 20, Issue 7