Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

PhyloScan: identification of transcription factor binding sites using cross-species evidence

Journal Article · · Algorithms for Molecular Biology
 [1];  [2];  [3];  [4]
  1. New York State Department of Health, Albany, NY (United States). The Wadsworth Center; DOE/OSTI
  2. New York State Department of Health, Albany, NY (United States). The Wadsworth Center; Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  3. New York State Department of Health, Albany, NY (United States). The Wadsworth Center; Rensselaer Polytechnic Inst., Troy, NY (United States). Dept. of Computer Science
  4. New York State Department of Health, Albany, NY (United States). The Wadsworth Center; Brown Univ., Providence, RI (United States). Division of Applied Mathematics
Background: When transcription factor binding sites are known for a particular transcription factor, it is possible to construct a motif model that can be used to scan sequences for additional sites. However, few statistically significant sites are revealed when a transcription factor binding site motif model is used to scan a genome-scale database. Methods: We have developed a scanning algorithm, PhyloScan, which combines evidence from matching sites found in orthologous data from several related species with evidence from multiple sites within an intergenic region, to better detect regulons. The orthologous sequence data may be multiply aligned, unaligned, or a combination of aligned and unaligned. In aligned data, PhyloScan statistically accounts for the phylogenetic dependence of the species contributing data to the alignment and, in unaligned data, the evidence for sites is combined assuming phylogenetic independence of the species. The statistical significance of the gene predictions is calculated directly, without employing training sets. Results: In a test of our methodology on synthetic data modeled on seven Enterobacteriales, four Vibrionales, and three Pasteurellales species, PhyloScan produces better sensitivity and specificity than MONKEY, an advanced scanning approach that also searches a genome for transcription factor binding sites using phylogenetic information. The application of the algorithm to real sequence data from seven Enterobacteriales species identifies novel Crp and PurR transcription factor binding sites, thus providing several new potential sites for these transcription factors. These sites enable targeted experimental validation and thus further delineation of the Crp and PurR regulons in E. coli. Conclusion: Better sensitivity and specificity can be achieved through a combination of (1) using mixed alignable and non-alignable sequence data and (2) combining evidence from multiple sites within an intergenic region.
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1626632
Journal Information:
Algorithms for Molecular Biology, Journal Name: Algorithms for Molecular Biology Journal Issue: 1 Vol. 2; ISSN 1748-7188
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (49)

Detecting Patterns in Protein Sequences journal June 1994
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons journal December 2001
Evolutionary trees from DNA sequences: A maximum likelihood approach journal November 1981
Information content of binding sites on nucleotide sequences journal April 1986
Comparison of biosequences journal December 1981
Molecular Studies of Evolution: A Source of Novel Statistical Problems book January 1971
Transcriptional Regulation of an Archaeal Operon In Vivo and In Vitro journal December 1999
Deciphering genetic regulatory codes: A challenge for functional genomics journal January 2002
Statistical significance for genomewide studies journal July 2003
MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices journal January 1995
SIGNAL SCAN 4.0: additional databases and sequence formats journal January 1996
DNA binding sites: representation and discovery journal January 2000
Methods for calculating the probabilities of finding patterns in sequences journal January 1989
Identification of consensus patterns in unaligned DNA sequences known to be functionally related journal January 1990
Comparative analysis of methods for representing and searching for transcription factor binding sites journal August 2004
Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data journal January 1995
Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes journal July 1999
RefSeq and LocusLink: NCBI gene-centered resources journal January 2001
Comparative analysis of FUR regulons in gamma-proteobacteria journal December 2001
Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences journal July 2002
Gibbs Recursive Sampler: finding transcription factor binding sites journal July 2003
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
Evolutionary distances for protein-coding sequences: modeling site- specific residue frequencies journal July 1998
A Comparative Genomics Approach to Prediction of New Members of Regulons journal April 2001
The Evolution of DNA Regulatory Regions for Proteo-Gamma Bacteria by Interspecies Comparisons journal February 2002
Conservation of the Biotin Regulon and the BirA Regulatory Signal in Eubacteria and Archaea journal October 2002
Factors Influencing the Identification of Transcription Factor Binding Sites by Cross-Species Comparison journal September 2002
Computational analysis of the transcriptional regulation of pentose utilization systems in the gamma subdivision of Proteobacteria journal December 2001
Binding Matrix: a Novel Approach for Binding site Recognition journal June 2004
A DNA element recognised by the molybdenum-responsive transcription factor ModE is conserved in Proteobacteria, green sulphur bacteria and Archaea journal January 2003
MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model journal November 2004
Empirical Bayes Analysis of a Microarray Experiment journal December 2001
Mammalian Genomes Ease Location of Human DNA Functional Segments but Not Their Description journal January 2004
Gibbs motif sampling: Detection of bacterial outer membrane protein repeats journal August 1995
A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome 1 1Edited by R. Ebright journal November 1998
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences journal June 1980
Evolutionary trees from DNA sequences: A maximum likelihood approach journal November 1981
Evolution of Protein Molecules book January 1969
Transcriptional Regulation of an Archaeal Operon In Vivo and In Vitro journal December 1999
Transcriptional regulation of transport and utilization systems for hexuronides, hexuronates and hexonates in gamma purple bacteria journal November 2000
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome journal January 2002
Methods and Statistics for Combining Motif Match Scores journal January 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes journal February 2001
RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 journal January 2004
PredictRegulon: a web server for the prediction of the regulatory protein binding sites and operons in prokaryote genomes journal July 2004
The Evolution of DNA Regulatory Regions for Proteo-Gamma Bacteria by Interspecies Comparisons journal January 2002
Binding Matrix: a Novel Approach for Binding site Recognition journal June 2004
Multiple PU.1 sites cooperate in the regulation ofp40phox transcription during granulocytic differentiation of myeloid cells journal June 2002

Cited By (12)

BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements journal August 2015
BLSSpeller to discover novel regulatory motifs in maize journal June 2022
Phyloscan: locating transcription-regulating binding sites in mixed aligned and unaligned sequence data journal April 2010
A new framework for identifying cis-regulatory motifs in prokaryotes journal December 2010
A survey of DNA motif finding algorithms journal November 2007
A generic approach to identify Transcription Factor-specific operator motifs; Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1 journal January 2008
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes journal August 2016
Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction journal November 2018
Genome-Wide Computational Prediction and Analysis of Core Promoter Elements across Plant Monocots and Dicots journal October 2013
SwissRegulon : a database of genome-wide annotations of regulatory sites: recent updates text January 2012
Guide to Genome-Wide Bacterial Transcription Factor Binding Site Prediction Using OmpR as Model book October 2011
SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates journal November 2012

Similar Records

Software to perform automated comparisons of pairwise percent identities for microbial species
Journal Article · Mon May 01 00:00:00 EDT 2006 · BioTechniques, 40(5):578-582 · OSTI ID:918854

De Novo Identification of Regulatory Regions in Intergenic Spaces of Prokaryotic Genomes
Technical Report · Mon Feb 19 23:00:00 EST 2007 · OSTI ID:902275

Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus
Journal Article · Wed Jun 15 00:00:00 EDT 2011 · BMC Genomics, 12(Suppl 1):Article No. S3 · OSTI ID:1018137