skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants

Abstract

The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.

Authors:
 [1];  [2];  [3];  [1]
  1. Peking Univ., Beijing (China). Center for Bioinformatics, State Key Lab. of Protein and Plant Gene Research, School of Life Sciences
  2. Peking Univ., Beijing (China). Center for Data Science; Beijing Inst. of Big Data Research, Beijing (China)
  3. Peking Univ., Beijing (China). Center for Data Science; Beijing Inst. of Big Data Research, Beijing (China); Princeton Univ., NJ (United States). Dept. of Mathematics and PACM
Publication Date:
Research Org.:
Princeton Univ., NJ (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
OSTI Identifier:
1430754
Alternate Identifier(s):
OSTI ID: 1502444
Grant/Contract Number:  
SC0009248
Resource Type:
Published Article
Journal Name:
Nucleic Acids Research
Additional Journal Information:
Journal Volume: 46; Journal Issue: 11; Journal ID: ISSN 0305-1048
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Wang, Meng, Tai, Cheng, E., Weinan, and Wei, Liping. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. United States: N. p., 2018. Web. doi:10.1093/nar/gky215.
Wang, Meng, Tai, Cheng, E., Weinan, & Wei, Liping. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. United States. doi:10.1093/nar/gky215.
Wang, Meng, Tai, Cheng, E., Weinan, and Wei, Liping. Wed . "DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants". United States. doi:10.1093/nar/gky215.
@article{osti_1430754,
title = {DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants},
author = {Wang, Meng and Tai, Cheng and E., Weinan and Wei, Liping},
abstractNote = {The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.},
doi = {10.1093/nar/gky215},
journal = {Nucleic Acids Research},
number = 11,
volume = 46,
place = {United States},
year = {2018},
month = {6}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
DOI: 10.1093/nar/gky215

Save / Share: