skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites

Journal Article · · PLoS Computational Biology (Online)
 [1];  [1];  [2];  [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Theoretical Division. Theoretical Biology and Biophysics Group
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division. National Stable Isotope Resource

An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1627207
Journal Information:
PLoS Computational Biology (Online), Vol. 6, Issue 11; ISSN 1553-7358
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English

References (43)

Selection of DNA binding sites by regulatory proteins journal February 1987
MatInspector and beyond: promoter analysis based on transcription factor binding sites journal April 2005
A Biophysical Approach to Transcription Factor Binding Site Discovery journal November 2003
MATCHTM: a tool for searching transcription factor binding sites in DNA sequences journal July 2003
Comparative analysis of methods for representing and searching for transcription factor binding sites journal August 2004
Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data journal January 1995
DNA binding sites: representation and discovery journal January 2000
Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge journal June 2005
Toward an atomistic model for predicting transcription-factor binding sites journal June 2004
Protein-DNA binding specificity predictions with structural models journal October 2005
Connecting protein structure with predictions of regulatory sites journal April 2007
Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors journal March 2009
Genome-wide analysis of Fis binding in Escherichia coli indicates a causative role for A-/AT-tracts journal May 2008
A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors journal January 2007
Protein-DNA Recognition Patterns and Predictions journal June 2005
Indirect readout: detection of optimized subsequences and calculation of relative binding affinities using different DNA elastic potentials journal October 2006
Predicting indirect readout effects in protein-DNA interactions journal May 2004
ReadOut: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognition journal July 2006
Intermolecular and Intramolecular Readout Mechanisms in Protein–DNA Recognition journal March 2004
RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation journal December 2007
Chromatin Immunoprecipitation for Determining the Association of Proteins with Specific Genomic Sequences In Vivo journal June 2004
Simulation and modeling of nucleic acid structure, dynamics and interactions journal June 2004
Recent advances in the study of nucleic acid flexibility by molecular dynamics journal April 2008
Molecular Dynamics Simulations of the 136 Unique Tetranucleotide Sequences of DNA Oligonucleotides. I. Research Design and Results on d(CpG) Steps journal December 2004
Molecular Dynamics Simulations of the 136 Unique Tetranucleotide Sequences of DNA Oligonucleotides. II: Sequence Context Effects on the Dynamical Structures of the 10 Unique Dinucleotide Steps journal December 2005
A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA journal November 2009
The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids journal September 1992
3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures journal September 2003
Scalable molecular dynamics with NAMD journal January 2005
All-atom empirical force field for nucleic acids: I. Parameter optimization based on small molecule and condensed phase macromolecular target data journal January 2000
A standard reference frame for the description of nucleic acid base-pair geometry 1 1Edited by P. E. Wright 2 2This is a document of the Nomenclature Committee of IUBMB (NC-IUBMB)/IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN), whose members are R. Cammack (chairman), A. Bairoch, H.M. Berman, S. Boyce, C.R. Cantor, K. Elliott, D. Horton, M. Kanehisa, A. Kotyk, G.P. Moss, N. Sharon and K.F. Tipton. journal October 2001
A computational procedure for determining energetically favorable binding sites on biologically important macromolecules journal July 1985
From genomics to chemical genomics: new developments in KEGG journal January 2006
A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome 1 1Edited by R. Ebright journal November 1998
An introduction to ROC analysis journal June 2006
Glycine binds the transcriptional accessory protein GcvR to disrupt a GcvA/GcvR interaction and allow GcvA-mediated activation of the Escherichia coli gcvTHP operon journal July 2002
Crystal structure of the Escherichia coli Rob transcription factor in complex with DNA journal May 2000
Additivity in protein-DNA interactions: how good an approximation is it? journal October 2002
The shape of the DNA minor groove directs binding by the DNA-bending protein Fis journal April 2010
Probabilistic Code for DNA Recognition by Proteins of the EGR Family journal November 2002
Design Principles for Regulator Gene Expression in a Repressible Gene Circuit journal September 2003
Simulation and modeling of nucleic acid structure, dynamics and interactions journal May 2004
MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices journal January 1995

Cited By (15)

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites journal April 2012
Improved predictions of transcription factor binding sites using physicochemical features of DNA journal August 2012
The floral homeotic protein SEPALLATA3 recognizes target DNA sequences by shape readout involving a conserved arginine residue in the MADS-domain journal June 2018
Bacterial promoter prediction: Selection of dynamic and static physical properties of DNA for reliable sequence classification journal February 2018
Transversions have larger regulatory effects than transitions journal May 2017
Genotype to Phenotype Mapping and the Fitness Landscape of the E. coli lac Promoter journal May 2013
The floral homeotic protein SEPALLATA3 recognizes target DNA sequences by shape readout involving a conserved arginine residue in the MADS-domain posted_content May 2017
Differences in local genomic context of bound and unbound motifs journal September 2012
PreCisIon: PREdiction of CIS-regulatory elements improved by gene’s positION journal December 2012
Modelling the transcription factor DNA-binding affinity using genome-wide ChIP-based data posted_content July 2016
Genome-wide analysis of transcription factor binding sites and their characteristic DNA structures journal January 2015
Binding of Nucleoid-Associated Protein Fis to DNA Is Regulated by DNA Breathing Dynamics journal January 2013
Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast journal August 2015
Guide to Genome-Wide Bacterial Transcription Factor Binding Site Prediction Using OmpR as Model book October 2011
Statistical investigation of position-specific deformation pattern of nucleosome DNA based on multiple conformational properties journal September 2011