| | |
Summary: JOURNAL OF COMPUTATIONAL BIOLOGY
Volume 14, Number 5, 2007
© Mary Ann Liebert, Inc.
Pp. 655668
DOI: 10.1089/cmb.2007.R008
Effects of Long-Range Correlations in DNA on
Sequence Alignment Score Statistics
PHILIPP W. MESSER,1
RALF BUNDSCHUH,2
MARTIN VINGRON,1
and PETER F. ARNDT1
ABSTRACT
Long-range correlations in genomic base composition are a ubiquitous statistical feature
among many eukaryotic genomes. In this article, these correlations are shown to substan-
tially influence the statistics of sequence alignment scores. Using a Gaussian approximation
to model the correlated score landscape, we calculate the corrections to the scale parameter
of the extreme value distribution of alignment scores. Our approximate analytic results are
supported by a detailed numerical study based on a simple algorithm to efficiently generate
long-range correlated random sequences. We find both, mean and exponential tail of the
score distribution for long-range correlated sequences to be substantially shifted compared
|