Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Software to perform automated comparisons of pairwise percent identities for microbial species

Journal Article · · BioTechniques, 40(5):578-582
DOI:https://doi.org/10.2144/000112170· OSTI ID:918854
The field of comparative genomics, which makes inferences about the properties of an organism through comparison of its genome to the genomes of related organisms, has seen rapid growth as a result of high-throughput sequencing initiatives. At the time of this study, there are almost 300 completely sequenced microbial genomes, and over 500 partially sequenced microbial genomes. This wealth of genomic data means that, for a given species, there will likely be many related species with sequenced genomes available for comparison. This is useful, for example, when identifying transcription factor binding sites using phylogenetic footprinting. It has been shown that including closely related species is valuable in phylogenetic footprinting studies, because species within a close phylogenetic group are likely to share common regulatory mechanisms (1,2). Regulatory motif detection is complicated, however, by the sequence correlation present between related species. Here we define sequence correlation as similarity between orthologous sequences that is due to recent speciation rather than functional constraints. Therefore, it is useful to understand the level of correlation in the sequence data prior to initiating a focused comparative genomics study such as phylogenetic footprinting. We present two programs (collect.identity.pl and analyze.identity.pl) that automate the tasks of performing pairwise sequence alignments between sets of homologous sequences and generating summary statistics, as well as the data required for additional statistical analyses. This provides a way to compare a focused group of related genomes across a large number of homologous loci.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
918854
Report Number(s):
PNWD-SA-7338
Journal Information:
BioTechniques, 40(5):578-582, Journal Name: BioTechniques, 40(5):578-582 Journal Issue: 5 Vol. 40; ISSN BTNQDO; ISSN 0736-6205
Country of Publication:
United States
Language:
English

Similar Records

POGO-DB—a database of pairwise-comparisons of genomes and conserved orthologous genes
Journal Article · Mon Nov 04 19:00:00 EST 2013 · Nucleic Acids Research · OSTI ID:1904534

Ortholog identification in the presence of domain architecture rearrangement
Journal Article · Mon Jun 27 20:00:00 EDT 2011 · Briefings in Bioinformatics · OSTI ID:1904524

A phylogenetic Gibbs sampler that yields centroid solutions of cis-regulatory sites
Journal Article · Sun Jul 15 00:00:00 EDT 2007 · Bioinformatics, 23(14):1718-1727 · OSTI ID:919290