PhyloScan: identification of transcription factor binding sites using cross-species evidence
Journal Article
·
· Algorithms for Molecular Biology
- New York State Department of Health, Albany, NY (United States). The Wadsworth Center; DOE/OSTI
- New York State Department of Health, Albany, NY (United States). The Wadsworth Center; Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- New York State Department of Health, Albany, NY (United States). The Wadsworth Center; Rensselaer Polytechnic Inst., Troy, NY (United States). Dept. of Computer Science
- New York State Department of Health, Albany, NY (United States). The Wadsworth Center; Brown Univ., Providence, RI (United States). Division of Applied Mathematics
Background: When transcription factor binding sites are known for a particular transcription factor, it is possible to construct a motif model that can be used to scan sequences for additional sites. However, few statistically significant sites are revealed when a transcription factor binding site motif model is used to scan a genome-scale database. Methods: We have developed a scanning algorithm, PhyloScan, which combines evidence from matching sites found in orthologous data from several related species with evidence from multiple sites within an intergenic region, to better detect regulons. The orthologous sequence data may be multiply aligned, unaligned, or a combination of aligned and unaligned. In aligned data, PhyloScan statistically accounts for the phylogenetic dependence of the species contributing data to the alignment and, in unaligned data, the evidence for sites is combined assuming phylogenetic independence of the species. The statistical significance of the gene predictions is calculated directly, without employing training sets. Results: In a test of our methodology on synthetic data modeled on seven Enterobacteriales, four Vibrionales, and three Pasteurellales species, PhyloScan produces better sensitivity and specificity than MONKEY, an advanced scanning approach that also searches a genome for transcription factor binding sites using phylogenetic information. The application of the algorithm to real sequence data from seven Enterobacteriales species identifies novel Crp and PurR transcription factor binding sites, thus providing several new potential sites for these transcription factors. These sites enable targeted experimental validation and thus further delineation of the Crp and PurR regulons in E. coli. Conclusion: Better sensitivity and specificity can be achieved through a combination of (1) using mixed alignable and non-alignable sequence data and (2) combining evidence from multiple sites within an intergenic region.
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
- Grant/Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1626632
- Journal Information:
- Algorithms for Molecular Biology, Journal Name: Algorithms for Molecular Biology Journal Issue: 1 Vol. 2; ISSN 1748-7188
- Publisher:
- BioMed CentralCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Software to perform automated comparisons of pairwise percent identities for microbial species
De Novo Identification of Regulatory Regions in Intergenic Spaces of Prokaryotic Genomes
Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus
Journal Article
·
Mon May 01 00:00:00 EDT 2006
· BioTechniques, 40(5):578-582
·
OSTI ID:918854
De Novo Identification of Regulatory Regions in Intergenic Spaces of Prokaryotic Genomes
Technical Report
·
Mon Feb 19 23:00:00 EST 2007
·
OSTI ID:902275
Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus
Journal Article
·
Wed Jun 15 00:00:00 EDT 2011
· BMC Genomics, 12(Suppl 1):Article No. S3
·
OSTI ID:1018137