skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments

Abstract

Background: Molecular evolutionary studies of noncodingsequences rely on multiple alignments. Yet how multiple alignmentaccuracy varies across sequence types, tree topologies, divergences andtools, and further how this variation impacts specific inferences,remains unclear. Results: Here we develop a molecular evolutionsimulation platform, CisEvolver, with models of background noncoding andtranscription factor binding site evolution, and use simulated alignmentsto systematically examine multiple alignment accuracy and its impact ontwo key molecular evolutionary inferences: transcription factor bindingsite conservation and divergence estimation. We find that the accuracy ofmultiple alignments is determined almost exclusively by the pairwisedivergence distance of the two most diverged species and that additionalspecies have a negligible influence on alignment accuracy. Conservedtranscription factor binding sites align better than surroundingnoncoding DNA yet are often found to be misaligned at relatively shortdivergence distances, such that studies of binding site gain and losscould easily be confounded by alignment error. Divergence estimates frommultiple alignments tend to be overestimated at short divergencedistances but reach a tool specific divergence at which they cease toincrease, leading to underestimation at long divergences. Our moststriking finding was that overall alignment accuracy, binding sitealignment accuracy and divergence estimation accuracy vary greatly acrossbranches in a tree and are most accurate for terminal branches connectingsister taxa and leastmore » accurate for internal branches connectingsub-alignments. Conclusions: Our results suggest that variation inalignment accuracy can lead to errors in molecular evolutionaryinferences that could be construed as biological variation. Thesefindings have implications for which species to choose for analyses, whatkind of errors would be expected for a given set of species and howmultiple alignment tools and phylogenetic inference methods might beimproved to minimize or control for alignment errors.« less

Authors:
; ; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Director, Office of Science; National Institutes ofHealth
OSTI Identifier:
923464
Report Number(s):
LBNL-62553
R&D Project: GHEIS2; BnR: 400412000; TRN: US200804%%1173
DOE Contract Number:  
DE-AC02-05CH11231; NIHR01-HG002779 02
Resource Type:
Journal Article
Journal Name:
BMC Bioinformatics
Additional Journal Information:
Journal Volume: 7; Journal Issue: 376; Related Information: Journal Publication Date: 08/14/2006
Country of Publication:
United States
Language:
English
Subject:
60; ACCURACY; ALIGNMENT; DNA; SIMULATION; TRANSCRIPTION FACTORS; CisEvolver

Citation Formats

Pollard, Daniel A, Moses, Alan M, Iyer, Venky N, and Eisen, Michael B. Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments. United States: N. p., 2006. Web.
Pollard, Daniel A, Moses, Alan M, Iyer, Venky N, & Eisen, Michael B. Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments. United States.
Pollard, Daniel A, Moses, Alan M, Iyer, Venky N, and Eisen, Michael B. 2006. "Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments". United States. https://www.osti.gov/servlets/purl/923464.
@article{osti_923464,
title = {Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments},
author = {Pollard, Daniel A and Moses, Alan M and Iyer, Venky N and Eisen, Michael B},
abstractNote = {Background: Molecular evolutionary studies of noncodingsequences rely on multiple alignments. Yet how multiple alignmentaccuracy varies across sequence types, tree topologies, divergences andtools, and further how this variation impacts specific inferences,remains unclear. Results: Here we develop a molecular evolutionsimulation platform, CisEvolver, with models of background noncoding andtranscription factor binding site evolution, and use simulated alignmentsto systematically examine multiple alignment accuracy and its impact ontwo key molecular evolutionary inferences: transcription factor bindingsite conservation and divergence estimation. We find that the accuracy ofmultiple alignments is determined almost exclusively by the pairwisedivergence distance of the two most diverged species and that additionalspecies have a negligible influence on alignment accuracy. Conservedtranscription factor binding sites align better than surroundingnoncoding DNA yet are often found to be misaligned at relatively shortdivergence distances, such that studies of binding site gain and losscould easily be confounded by alignment error. Divergence estimates frommultiple alignments tend to be overestimated at short divergencedistances but reach a tool specific divergence at which they cease toincrease, leading to underestimation at long divergences. Our moststriking finding was that overall alignment accuracy, binding sitealignment accuracy and divergence estimation accuracy vary greatly acrossbranches in a tree and are most accurate for terminal branches connectingsister taxa and least accurate for internal branches connectingsub-alignments. Conclusions: Our results suggest that variation inalignment accuracy can lead to errors in molecular evolutionaryinferences that could be construed as biological variation. Thesefindings have implications for which species to choose for analyses, whatkind of errors would be expected for a given set of species and howmultiple alignment tools and phylogenetic inference methods might beimproved to minimize or control for alignment errors.},
doi = {},
url = {https://www.osti.gov/biblio/923464}, journal = {BMC Bioinformatics},
number = 376,
volume = 7,
place = {United States},
year = {Mon Aug 14 00:00:00 EDT 2006},
month = {Mon Aug 14 00:00:00 EDT 2006}
}