DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites

Abstract

An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similarmore » to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.« less

Authors:
 [1];  [1];  [2];  [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Theoretical Division. Theoretical Biology and Biophysics Group
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division. National Stable Isotope Resource
Publication Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
OSTI Identifier:
1627207
Grant/Contract Number:  
AC52-06NA25396
Resource Type:
Accepted Manuscript
Journal Name:
PLoS Computational Biology (Online)
Additional Journal Information:
Journal Name: PLoS Computational Biology (Online); Journal Volume: 6; Journal Issue: 11; Journal ID: ISSN 1553-7358
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Biochemistry & Molecular Biology; Mathematical & Computational Biology

Citation Formats

Bauer, Amy L., Hlavacek, William S., Unkefer, Pat J., and Mu, Fangping. Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites. United States: N. p., 2010. Web. doi:10.1371/journal.pcbi.1001007.
Bauer, Amy L., Hlavacek, William S., Unkefer, Pat J., & Mu, Fangping. Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites. United States. https://doi.org/10.1371/journal.pcbi.1001007
Bauer, Amy L., Hlavacek, William S., Unkefer, Pat J., and Mu, Fangping. Thu . "Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites". United States. https://doi.org/10.1371/journal.pcbi.1001007. https://www.osti.gov/servlets/purl/1627207.
@article{osti_1627207,
title = {Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites},
author = {Bauer, Amy L. and Hlavacek, William S. and Unkefer, Pat J. and Mu, Fangping},
abstractNote = {An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.},
doi = {10.1371/journal.pcbi.1001007},
journal = {PLoS Computational Biology (Online)},
number = 11,
volume = 6,
place = {United States},
year = {Thu Nov 18 00:00:00 EST 2010},
month = {Thu Nov 18 00:00:00 EST 2010}
}

Works referenced in this record:

Selection of DNA binding sites by regulatory proteins
journal, February 1987


MatInspector and beyond: promoter analysis based on transcription factor binding sites
journal, April 2005


A Biophysical Approach to Transcription Factor Binding Site Discovery
journal, November 2003


MATCHTM: a tool for searching transcription factor binding sites in DNA sequences
journal, July 2003


Comparative analysis of methods for representing and searching for transcription factor binding sites
journal, August 2004


Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data
journal, January 1995

  • Quandt, Kerstin; Frech, Kornelie; Karas, Holger
  • Nucleic Acids Research, Vol. 23, Issue 23
  • DOI: 10.1093/nar/23.23.4878

DNA binding sites: representation and discovery
journal, January 2000


Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge
journal, June 2005


Toward an atomistic model for predicting transcription-factor binding sites
journal, June 2004

  • Endres, Robert G.; Schulthess, Thomas C.; Wingreen, Ned S.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 57, Issue 2
  • DOI: 10.1002/prot.20199

Protein-DNA binding specificity predictions with structural models
journal, October 2005


Connecting protein structure with predictions of regulatory sites
journal, April 2007

  • Morozov, A. V.; Siggia, E. D.
  • Proceedings of the National Academy of Sciences, Vol. 104, Issue 17, p. 7068-7073
  • DOI: 10.1073/pnas.0701356104

Genome-wide analysis of Fis binding in Escherichia coli indicates a causative role for A-/AT-tracts
journal, May 2008

  • Cho, B. -K.; Knight, E. M.; Barrett, C. L.
  • Genome Research, Vol. 18, Issue 6
  • DOI: 10.1101/gr.070276.107

A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors
journal, January 2007


Protein-DNA Recognition Patterns and Predictions
journal, June 2005


Indirect readout: detection of optimized subsequences and calculation of relative binding affinities using different DNA elastic potentials
journal, October 2006

  • Becker, Nils B.; Wolff, Lars; Everaers, Ralf
  • Nucleic Acids Research, Vol. 34, Issue 19
  • DOI: 10.1093/nar/gkl683

Predicting indirect readout effects in protein-DNA interactions
journal, May 2004

  • Zhang, Y.; Xi, Z.; Hegde, R. S.
  • Proceedings of the National Academy of Sciences, Vol. 101, Issue 22
  • DOI: 10.1073/pnas.0402319101

ReadOut: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognition
journal, July 2006

  • Ahmad, S.; Kono, H.; Arauzo-Bravo, M. J.
  • Nucleic Acids Research, Vol. 34, Issue Web Server
  • DOI: 10.1093/nar/gkl104

Intermolecular and Intramolecular Readout Mechanisms in Protein–DNA Recognition
journal, March 2004

  • Michael Gromiha, M.; Siebers, Jörg G.; Selvaraj, Samuel
  • Journal of Molecular Biology, Vol. 337, Issue 2
  • DOI: 10.1016/j.jmb.2004.01.033

RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation
journal, December 2007

  • Gama-Castro, S.; Jimenez-Jacinto, V.; Peralta-Gil, M.
  • Nucleic Acids Research, Vol. 36, Issue Database
  • DOI: 10.1093/nar/gkm994

Chromatin Immunoprecipitation for Determining the Association of Proteins with Specific Genomic Sequences In Vivo
journal, June 2004


Simulation and modeling of nucleic acid structure, dynamics and interactions
journal, June 2004


Recent advances in the study of nucleic acid flexibility by molecular dynamics
journal, April 2008

  • Orozco, Modesto; Noy, Agnes; Pérez, Alberto
  • Current Opinion in Structural Biology, Vol. 18, Issue 2
  • DOI: 10.1016/j.sbi.2008.01.005

Molecular Dynamics Simulations of the 136 Unique Tetranucleotide Sequences of DNA Oligonucleotides. I. Research Design and Results on d(CpG) Steps
journal, December 2004


A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA
journal, November 2009

  • Lavery, Richard; Zakrzewska, Krystyna; Beveridge, David
  • Nucleic Acids Research, Vol. 38, Issue 1
  • DOI: 10.1093/nar/gkp834

The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids
journal, September 1992


Scalable molecular dynamics with NAMD
journal, January 2005

  • Phillips, James C.; Braun, Rosemary; Wang, Wei
  • Journal of Computational Chemistry, Vol. 26, Issue 16, p. 1781-1802
  • DOI: 10.1002/jcc.20289

A computational procedure for determining energetically favorable binding sites on biologically important macromolecules
journal, July 1985


From genomics to chemical genomics: new developments in KEGG
journal, January 2006


A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome 1 1Edited by R. Ebright
journal, November 1998

  • Robison, Keith; McGuire, Abigail Manson; Church, George M.
  • Journal of Molecular Biology, Vol. 284, Issue 2
  • DOI: 10.1006/jmbi.1998.2160

An introduction to ROC analysis
journal, June 2006


Crystal structure of the Escherichia coli Rob transcription factor in complex with DNA
journal, May 2000

  • Kwon, Hyock Joo; Bennik, Marjon H. J.; Demple, Bruce
  • Nature Structural Biology, Vol. 7, Issue 5, p. 424-430
  • DOI: 10.1038/75213

Additivity in protein-DNA interactions: how good an approximation is it?
journal, October 2002


The shape of the DNA minor groove directs binding by the DNA-bending protein Fis
journal, April 2010

  • Stella, S.; Cascio, D.; Johnson, R. C.
  • Genes & Development, Vol. 24, Issue 8
  • DOI: 10.1101/gad.1900610

Chromatin Immunoprecipitation for Determining the Association of Proteins with Specific Genomic Sequences In Vivo
journal, June 2004


Toward an atomistic model for predicting transcription-factor binding sites
journal, June 2004

  • Endres, Robert G.; Schulthess, Thomas C.; Wingreen, Ned S.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 57, Issue 2
  • DOI: 10.1002/prot.20199

Recent advances in the study of nucleic acid flexibility by molecular dynamics
journal, April 2008

  • Orozco, Modesto; Noy, Agnes; Pérez, Alberto
  • Current Opinion in Structural Biology, Vol. 18, Issue 2
  • DOI: 10.1016/j.sbi.2008.01.005

Probabilistic Code for DNA Recognition by Proteins of the EGR Family
journal, November 2002


Design Principles for Regulator Gene Expression in a Repressible Gene Circuit
journal, September 2003

  • Wall, Michael E.; Hlavacek, William S.; Savageau, Michael A.
  • Journal of Molecular Biology, Vol. 332, Issue 4
  • DOI: 10.1016/s0022-2836(03)00948-3

Simulation and modeling of nucleic acid structure, dynamics and interactions
journal, May 2004


MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices
journal, January 1995


DNA binding sites: representation and discovery
journal, January 2000


Comparative analysis of methods for representing and searching for transcription factor binding sites
journal, August 2004


MatInspector and beyond: promoter analysis based on transcription factor binding sites
journal, April 2005


Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data
journal, January 1995

  • Quandt, Kerstin; Frech, Kornelie; Karas, Holger
  • Nucleic Acids Research, Vol. 23, Issue 23
  • DOI: 10.1093/nar/23.23.4878

Additivity in protein-DNA interactions: how good an approximation is it?
journal, October 2002


MATCHTM: a tool for searching transcription factor binding sites in DNA sequences
journal, July 2003


Protein-DNA binding specificity predictions with structural models
journal, October 2005


ReadOut: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognition
journal, July 2006

  • Ahmad, S.; Kono, H.; Arauzo-Bravo, M. J.
  • Nucleic Acids Research, Vol. 34, Issue Web Server
  • DOI: 10.1093/nar/gkl104

Indirect readout: detection of optimized subsequences and calculation of relative binding affinities using different DNA elastic potentials
journal, October 2006

  • Becker, Nils B.; Wolff, Lars; Everaers, Ralf
  • Nucleic Acids Research, Vol. 34, Issue 19
  • DOI: 10.1093/nar/gkl683

RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation
journal, December 2007

  • Gama-Castro, S.; Jimenez-Jacinto, V.; Peralta-Gil, M.
  • Nucleic Acids Research, Vol. 36, Issue Database
  • DOI: 10.1093/nar/gkm994

A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA
journal, November 2009

  • Lavery, Richard; Zakrzewska, Krystyna; Beveridge, David
  • Nucleic Acids Research, Vol. 38, Issue 1
  • DOI: 10.1093/nar/gkp834

The shape of the DNA minor groove directs binding by the DNA-bending protein Fis
journal, April 2010

  • Stella, S.; Cascio, D.; Johnson, R. C.
  • Genes & Development, Vol. 24, Issue 8
  • DOI: 10.1101/gad.1900610

Genome-wide analysis of Fis binding in Escherichia coli indicates a causative role for A-/AT-tracts
journal, May 2008

  • Cho, B. -K.; Knight, E. M.; Barrett, C. L.
  • Genome Research, Vol. 18, Issue 6
  • DOI: 10.1101/gr.070276.107

Protein-DNA Recognition Patterns and Predictions
journal, June 2005


Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge
journal, June 2005


Molecular Dynamics Simulations of the 136 Unique Tetranucleotide Sequences of DNA Oligonucleotides. I. Research Design and Results on d(CpG) Steps
journal, December 2004


Works referencing / citing this record:

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
journal, April 2012

  • Hooghe, Bart; Broos, Stefan; van Roy, Frans
  • Nucleic Acids Research, Vol. 40, Issue 14
  • DOI: 10.1093/nar/gks283

Improved predictions of transcription factor binding sites using physicochemical features of DNA
journal, August 2012

  • Maienschein-Cline, Mark; Dinner, Aaron R.; Hlavacek, William S.
  • Nucleic Acids Research, Vol. 40, Issue 22
  • DOI: 10.1093/nar/gks771

The floral homeotic protein SEPALLATA3 recognizes target DNA sequences by shape readout involving a conserved arginine residue in the MADS-domain
journal, June 2018

  • Käppel, Sandra; Melzer, Rainer; Rümpler, Florian
  • The Plant Journal, Vol. 95, Issue 2
  • DOI: 10.1111/tpj.13954

Bacterial promoter prediction: Selection of dynamic and static physical properties of DNA for reliable sequence classification
journal, February 2018

  • Ryasik, Artem; Orlov, Mikhail; Zykova, Evgenia
  • Journal of Bioinformatics and Computational Biology, Vol. 16, Issue 01
  • DOI: 10.1142/s0219720018400036

Transversions have larger regulatory effects than transitions
journal, May 2017


Genotype to Phenotype Mapping and the Fitness Landscape of the E. coli lac Promoter
journal, May 2013


Differences in local genomic context of bound and unbound motifs
journal, September 2012


PreCisIon: PREdiction of CIS-regulatory elements improved by gene’s positION
journal, December 2012

  • Elati, Mohamed; Nicolle, Rémy; Junier, Ivan
  • Nucleic Acids Research, Vol. 41, Issue 3
  • DOI: 10.1093/nar/gks1286

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
journal, April 2012

  • Hooghe, Bart; Broos, Stefan; van Roy, Frans
  • Nucleic Acids Research, Vol. 40, Issue 14
  • DOI: 10.1093/nar/gks283

Improved predictions of transcription factor binding sites using physicochemical features of DNA
journal, August 2012

  • Maienschein-Cline, Mark; Dinner, Aaron R.; Hlavacek, William S.
  • Nucleic Acids Research, Vol. 40, Issue 22
  • DOI: 10.1093/nar/gks771

Modelling the transcription factor DNA-binding affinity using genome-wide ChIP-based data
posted_content, July 2016

  • Alhamdoosh, Monther; Wang, Dianhui
  • Research Gate
  • DOI: 10.1101/061978

Genome-wide analysis of transcription factor binding sites and their characteristic DNA structures
journal, January 2015


Transversions have larger regulatory effects than transitions
journal, May 2017


Binding of Nucleoid-Associated Protein Fis to DNA Is Regulated by DNA Breathing Dynamics
journal, January 2013

  • Nowak-Lovato, Kristy; Alexandrov, Ludmil B.; Banisadr, Afsheen
  • PLoS Computational Biology, Vol. 9, Issue 1
  • DOI: 10.1371/journal.pcbi.1002881

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast
journal, August 2015


Guide to Genome-Wide Bacterial Transcription Factor Binding Site Prediction Using OmpR as Model
book, October 2011

  • Vuong, Phu; Misr, Rajeev
  • Selected Works in Bioinformatics
  • DOI: 10.5772/24321