skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments

Journal Article · · BMC Bioinformatics
 [1];  [1];  [2];  [3]
  1. Univ. of California, Berkeley, CA (United States). Graduate Group in Biophysics
  2. Univ. of California, Berkeley, CA (United States). Dept. of Molecular and Cell Biology
  3. Univ. of California, Berkeley, CA (United States). Graduate Group in Biophysics; Univ. of California, Berkeley, CA (United States). Dept. of Molecular and Cell Biology; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Dicision. Dept. of Genome Sciences; Univ. of California, Berkeley, CA (United States). Center for Integrative Genomics

Background: Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. Results: Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments. Conclusion: Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1626325
Journal Information:
BMC Bioinformatics, Vol. 7, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (87)

Trade-Offs in Detecting Evolutionarily Constrained Sequence by Comparative Genomics journal September 2005
Comparison of genomic DNA sequences: solved and unsolved problems journal May 2001
Comparative Genomics journal September 2004
The many faces of sequence alignment journal January 2005
Benchmarking tools for the alignment of functional noncoding DNA journal January 2004
Evolutionary distance estimation and fidelity of pair wise sequence alignment journal January 2005
Aligning Multiple Genomic Sequences With the Threaded Blockset Aligner journal April 2004
Multiple sequence alignment accuracy and evolutionary distance estimation journal November 2005
A Model of the Statistical Power of Comparative Genome Sequence Analysis journal January 2005
Performance of a Divergence Time Estimation Method under a Probabilistic Model of Rate Evolution journal March 2001
Phylogenies from Molecular Sequences: Inference and Reliability journal December 1988
Application and Accuracy of Molecular Phylogenies journal April 1994
A comprehensive comparison of multiple sequence alignment programs journal July 1999
MCALIGN: Stochastic Alignment of Noncoding DNA Sequences Based on an Evolutionary Model of Sequence Evolution journal March 2004
Human-mouse genome comparisons to locate regulatory sites journal October 2000
Factors Influencing the Identification of Transcription Factor Binding Sites by Cross-Species Comparison journal September 2002
Embryonic enhancers in the dpp disk region regulate a second round of Dpp signaling from the dorsal ectoderm to the mesoderm that represses Zfh-1 expression in a subset of pericardial cells journal October 2003
Combining phylogenetic data with co-regulated genes to identify regulatory motifs journal December 2003
Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura journal May 2004
Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura journal January 2004
Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila journal January 2004
MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model journal November 2004
Identification of functional transcription factor binding sites using closely related Saccharomyces species journal April 2005
Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics journal August 2005
Computational screening of conserved genomic DNA in search of functional noncoding elements journal July 2005
De novo discovery of a tissue-specific gene regulatory module in a chordate journal September 2005
Identifying the conserved network of cis-regulatory sites of a eukaryotic genome journal November 2005
Conservation of regulatory elements between two species of Drosophila journal January 2003
Evidence for stabilizing selection in a eukaryotic enhancer element journal February 2000
Evolution of Transcription Factor Binding Sites in Mammalian Gene Regulatory Regions: Conservation and Turnover journal July 2002
Tracing the Evolutionary History of Drosophila Regulatory Regions with Models that Identify Transcription Factor Binding Sites journal April 2003
Dynamics and function of intron sequences of the wingless gene during the evolution of the Drosophila genus journal September 2004
Sequence Turnover and Tandem Repeats in cis-Regulatory Modules in Drosophila journal January 2005
Conserved noncoding sequences are reliable guides to regulatory elements journal September 2000
Quantitative Estimates of Sequence Divergence for Comparative Analyses of Mammalian Genomes journal May 2003
Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents journal November 2003
The Share of Human Genomic DNA under Selection Estimated from Human-Mouse Genomic Alignments journal January 2003
Distinguishing Regulatory DNA From Neutral Sites journal January 2003
Patterns of Evolutionary Constraints in Intronic and Intergenic DNA of Drosophila journal February 2004
Regulatory Potential Scores From Genome-Wide Three-Way Alignments of Human, Mouse, and Rat journal April 2004
Evolutionary constraints in conserved nongenic sequences of mammals journal September 2005
Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences journal July 2005
Generation Time and Genomic Evolution in Primates journal March 1973
Divergence of Conserved Non-Coding Sequences: Rate Estimates and Relative Rate Tests journal July 2004
The Correlation Between Intron Length and Recombination in Drosophila: Dynamic Equilibrium Between Mutational and Selective Forces journal November 2000
Evolutionary distances for protein-coding sequences: modeling site- specific residue frequencies journal July 1998
Information content of binding sites on nucleotide sequences journal April 1986
Transcriptional Control in the Segmentation Gene Network of Drosophila journal August 2004
Extraction of Functional Binding Sites from Unique Regulatory Regions: The Drosophila Early Developmental Enhancers journal February 2002
Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster journal November 2004
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice journal January 1994
MAVID: Constrained Ancestral Alignment of Multiple Sequences journal April 2004
LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA journal March 2003
PipMaker---A Web Server for Aligning Two Genomic DNA Sequences journal April 2000
Human-Mouse Alignments with BLASTZ journal January 2003
Aligning alignments exactly conference January 2004
Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods journal September 1994
Identification and Classification of Conserved RNA Secondary Structures in the Human Genome journal April 2006
Chromosome evolution in eukaryotes: a multi-kingdom perspective journal December 2005
Conservation of regulatory sequences and gene expression patterns in the disintegrating Drosophila Hox gene complex journal April 2005
Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution journal January 2005
Transcriptional regulatory code of a eukaryotic genome journal September 2004
Chromosomal organization is shaped by the transcription regulatory network journal March 2005
Twilight zone of protein sequence alignments journal February 1999
Analysis of Conserved Noncoding DNA in Drosophila Reveals Similar Constraints in Intergenic and Intronic Sequences journal August 2001
High intrinsic rate of DNA loss in Drosophila journal November 1996
High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups journal March 1998
BLAT---The BLAST-Like Alignment Tool journal March 2002
R: A Language for Data Analysis and Graphics journal September 1996
Estimation of evolutionary distances between nucleotide sequences journal September 1994
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA journal October 1985
Decoding cis-regulatory DNAs in the Drosophila genome journal October 2002
Functional evolution of noncoding DNA journal December 2002
Pattern of selective constraint in C. elegans and C. briggsae genomes journal August 1999
The CICERO system for CAD/CAM fabrication of full-ceramic crowns journal March 2001
R: A Language for Data Analysis and Graphics journal September 1996
PAML: a program package for phylogenetic analysis by maximum likelihood journal January 1997
Identification of consensus patterns in unaligned DNA sequences known to be functionally related journal January 1990
Accurate anchoring alignment of divergent sequences journal November 2005
Expected Rates and Modes of Evolution of Enhancer Sequences journal February 2004
Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. journal March 1994
Comparative analysis of multiple protein-sequence alignment methods. journal July 1994
Relative efficiencies of the maximum-parsimony and distance-matrix methods of phylogeny construction for restriction data. journal May 1991
Extraction of Functional Binding Sites from Unique Regulatory Regions: The Drosophila Early Developmental Enhancers journal February 2002
rVistafor Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites journal April 2002
BLAT---The BLAST-Like Alignment Tool journal March 2002
Identification and Classification of Conserved RNA Secondary Structures in the Human Genome journal January 2005

Cited By (18)

Drosophila Biology in the Genomic Age journal November 2007
Pattern recognition and probabilistic measures in alignment-free sequence analysis journal October 2013
Blueprint for a High-Performance Biomaterial: Full-Length Spider Dragline Silk Genes journal June 2007
Determination of novel members in the Drosophila melanogaster anteriorposterior patterning system using natural variation posted_content January 2018
Is Transcription Factor Binding Site Turnover a Sufficient Explanation for Cis-Regulatory Sequence Divergence? journal January 2010
Confirming the Phylogeny of Mammals by Use of Large Comparative Sequence Data Sets journal May 2008
Identification of Lineage-SpecificCis-Regulatory Modules Associated with Variation in Transcription Factor Binding and Chromatin Activity Using Ornstein–Uhlenbeck Models journal May 2015
REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila journal December 2007
How accurately is ncRNA aligned within whole-genome multiple alignments? journal October 2007
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC journal January 2009
Statistical tests for natural selection on regulatory regions based on the strength of transcription factor binding sites journal January 2009
Phylogenetic Simulation of Promoter Evolution: Estimation and Modeling of Binding Site Turnover Events and Assessing Their Impact on Alignment Tools journal January 2007
Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome journal January 2009
Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm journal February 2008
Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution journal March 2009
Modeling the Evolution of Regulatory Elements by Simultaneous Detection and Alignment with Phylogenetic Pair HMMs journal December 2010
Whole-Genome Cartography of Estrogen Receptor α Binding Sites journal June 2007
Pervasive Divergence of Transcriptional Gene Regulation in Caenorhabditis Nematodes journal June 2014