skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Improved predictions of transcription factor binding sites using physicochemical features of DNA

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gks771· OSTI ID:1625507
 [1];  [1];  [2];  [2]
  1. Univ. of Chicago, IL (United States). Dept. of Chemistry
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Theoretical Biology and Biophysics Group; Univ. of New Mexico, Albuquerque, NM (United States). Dept. of Biology

Typical approaches for predicting transcription factor binding sites (TFBSs) involve use of a position-specific weight matrix (PWM) to statistically characterize the sequences of the known sites. Recently, an alternative physicochemical approach, called SiteSleuth, was proposed. In this approach, a linear support vector machine (SVM) classifier is trained to distinguish TFBSs from background sequences based on local chemical and structural features of DNA. SiteSleuth appears to generally perform better than PWM-based methods. Here, we improve the SiteSleuth approach by considering both new physicochemical features and algorithmic modifications. New features are derived from Gibbs energies of amino acid–DNA interactions and hydroxyl radical cleavage profiles of DNA. Algorithmic modifications consist of inclusion of a feature selection step, use of a nonlinear kernel in the SVM classifier, and use of a consensus-based post-processing step for predictions. We also considered SVM classification based on letter features alone to distinguish performance gains from use of SVM-based models versus use of physicochemical features. The accuracy of each of the variant methods considered was assessed by cross validation using data available in the RegulonDB database for 54 Escherichia coli TFs, as well as by experimental validation using published ChIP-chip data available for Fis and Lrp.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division; National Institutes of Health (NIH)
Grant/Contract Number:
AC52-06NA25396; RR018754; GM085273; GM081892
OSTI ID:
1625507
Journal Information:
Nucleic Acids Research, Vol. 40, Issue 22; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (83)

MATCHTM: a tool for searching transcription factor binding sites in DNA sequences journal July 2003
Comparative analysis of methods for representing and searching for transcription factor binding sites journal August 2004
CHARMM: the biomolecular simulation program text January 2009
Mapping and analysis of chromatin state dynamics in nine human cell types journal March 2011
A genomic code for nucleosome positioning journal July 2006
Additivity in protein-DNA interactions: how good an approximation is it? journal October 2002
RegulonDB: a database on transcriptional regulation in Escherichia coli journal January 1998
A Smoothed Backbone-Dependent Rotamer Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions journal June 2011
Predicting Target DNA Sequences of DNA-Binding Proteins Based on Unbound Structures journal February 2012
JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles journal November 2009
Scalable molecular dynamics with NAMD journal January 2005
A Biophysical Approach to Transcription Factor Binding Site Discovery journal November 2003
Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites journal November 2010
Information analysis of Fis binding sites journal December 1997
Flipping Watson and Crick journal February 2011
Genomic Maps and Comparative Analysis of Histone Modifications in Human and Mouse journal January 2005
Prediction of TF target sites based on atomistic models of protein-DNA complexes journal October 2008
Genome-wide maps of chromatin state in pluripotent and lineage-committed cells journal July 2007
All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins journal April 1998
High-Resolution Profiling of Histone Methylations in the Human Genome journal May 2007
Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome journal February 2007
Toward an atomistic model for predicting transcription-factor binding sites journal June 2004
Predicting transcription factor specificity with all-atom models journal October 2008
From genomics to chemical genomics: new developments in KEGG journal January 2006
Functional Characterization of the Escherichia coli Fis–DNA Binding Sequence journal February 2008
ChIPing the cistrome of PXR in mouse liver journal August 2010
The Fis protein: it's not just for DNA inversion anymore journal March 1993
CHARMM: The biomolecular simulation program journal July 2009
BioBayesNet: a web server for feature extraction and Bayesian network modeling of biological sequence data journal May 2007
A review of feature selection techniques in bioinformatics journal August 2007
Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs journal April 2010
TRANSFAC(R): transcriptional regulation, from patterns to profiles journal January 2003
WebLogo: A Sequence Logo Generator journal May 2004
The role of DNA shape in protein–DNA recognition journal October 2009
Direct inference of protein-DNA interactions using compressed sensing methods journal August 2011
Construction of a genome-scale structural map at single-nucleotide resolution journal June 2007
Signatures of Protein-DNA Recognition in Free DNA Binding Sites journal March 2009
Local DNA Topography Correlates with Functional Noncoding Regions of the Human Genome journal April 2009
Detection of functional DNA motifs via statistical over-representation journal February 2004
Dynamic regulation of PU.1 expression in multipotent hematopoietic progenitors journal January 2005
All-atom empirical force field for nucleic acids: I. Parameter optimization based on small molecule and condensed phase macromolecular target data journal January 2000
Protein-DNA Recognition Patterns and Predictions journal June 2005
Extending the treatment of backbone energetics in protein force fields: Limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations journal August 2004
Insights from genomic profiling of transcription factors journal August 2009
DNA binding sites: representation and discovery journal January 2000
Diversity and Complexity in DNA Recognition by Transcription Factors journal May 2009
All-atom empirical force field for nucleic acids: II. Application to molecular dynamics simulations of DNA and RNA in solution journal January 2000
[11] Using hydroxyl radical to probe DNA structure book January 1992
Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data journal August 2008
The Fis protein: it's not just for DNA inversion anymore journal November 1992
Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same journal February 2010
Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge journal June 2005
Functional Specificity of a Hox Protein Mediated by the Recognition of Minor Groove Structure journal November 2007
A Chromatin Landmark and Transcription Initiation at Most Promoters in Human Cells journal July 2007
Selection of DNA binding sites by regulatory proteins journal February 1987
Transient Hoogsteen base pairs in canonical duplex DNA journal January 2011
Genome-Wide Mapping of in Vivo Protein-DNA Interactions journal June 2007
Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing journal June 2007
Protein-DNA binding specificity predictions with structural models journal October 2005
Selection of DNA binding sites by regulatory proteins journal April 1988
Origins of Specificity in Protein-DNA Recognition journal June 2010
Connecting protein structure with predictions of regulatory sites journal April 2007
Genome-scale reconstruction of the Lrp regulatory network in Escherichia coli journal December 2008
Variable Structures of Fis-DNA Complexes Determined by Flanking DNA – Protein Contacts journal December 1996
A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites journal November 2006
LIBSVM: A library for support vector machines journal April 2011
Sequence logos: a new way to display consensus sequences journal January 1990
Determining the specificity of protein–DNA interactions journal September 2010
Engineering Static and Dynamic Control of Synthetic Pathways journal January 2010
Species-Specific Transcription in Mice Carrying Human Chromosome 21 journal October 2008
Genome-wide analysis of Fis binding in Escherichia coli indicates a causative role for A-/AT-tracts journal May 2008
Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli journal November 2010
Mechanism of pore opening in the calcium-activated chloride channel TMEM16A journal February 2021
Synthetic biology: applications come of age journal May 2010
Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data journal January 1995
RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units) journal November 2010
VMD: Visual molecular dynamics journal February 1996
Selection of DNA binding sites by regulatory proteins journal June 1988
RNA Polymerase II: Just Stopping By journal July 2007
ChIP-Seq Data Reveal Nucleosome Architecture of Human Promoters journal November 2007
Probabilistic Code for DNA Recognition by Proteins of the EGR Family journal November 2002
(Compressed) sensing and sensibility journal August 2011
MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices journal January 1995

Cited By (14)

CollecTF: a database of experimentally validated transcription factor-binding sites in Bacteria journal November 2013
The floral homeotic protein SEPALLATA3 recognizes target DNA sequences by shape readout involving a conserved arginine residue in the MADS-domain journal June 2018
Identification and positional distribution analysis of transcription factor binding sites for genes from the wheat fl-cDNA sequences journal June 2017
Transversions have larger regulatory effects than transitions journal May 2017
The floral homeotic protein SEPALLATA3 recognizes target DNA sequences by shape readout involving a conserved arginine residue in the MADS-domain posted_content May 2017
Absence of a simple code: how transcription factors read the genome journal September 2014
The pattern of DNA cleavage intensity around indels journal February 2015
Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression journal November 2016
TFBSshape: a motif database for DNA shape features of transcription factor binding sites journal November 2013
GBshape: a genome browser database for DNA shape annotations journal October 2014
Functional Implications of Local DNA Structures in Regulatory Motifs journal January 2013
Genome-wide analysis of transcription factor binding sites and their characteristic DNA structures journal January 2015
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system journal November 2015
Binding of Nucleoid-Associated Protein Fis to DNA Is Regulated by DNA Breathing Dynamics journal January 2013

Similar Records

Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
Journal Article · Thu Nov 18 00:00:00 EST 2010 · PLoS Computational Biology (Online) · OSTI ID:1625507

Global transcriptional regulatory network for Escherichia coli robustly connects gene expression to transcription factor activities
Journal Article · Tue Sep 05 00:00:00 EDT 2017 · Proceedings of the National Academy of Sciences of the United States of America · OSTI ID:1625507

Physicochemical property distributions for accurate and rapid pairwise protein homology detection
Journal Article · Fri Mar 19 00:00:00 EDT 2010 · BMC Bioinformatics · OSTI ID:1625507