Alignment Statistics for Long-Range Correlated Genomic Sequences

Philipp W. Messer1
, Ralf Bundschuh2
, Martin Vingron1
, and Peter F. Arndt1
Max Planck Institute for Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany
Department of Physics, Ohio State University, 191 W Woodruff Av.,
Columbus OH 43210-1117, USA
Abstract. It is well known that the base composition along eukaryotic
genomes is long-range correlated. Here, we investigate the effect of such
long-range correlations on alignment score statistics. We model the cor-
related score-landscape by means of a Gaussian approximation. In this
framework, we can calculate the corrections to the scale parameter of
the extreme value distribution of alignment scores. To evaluate our ap-
proximate analytic results, we perform a detailed numerical study based
on a simple algorithm to efficiently generate long-range correlated ran-
dom sequences. We find that the mean and the exponential tail of the


Source: Arndt, Peter - Max-Planck-Institut für molekulare Genetik
Spang, Rainer - Computational Molecular Biology Group, Max-Planck-Institut für molekulare Genetik


Collections: Biology and Medicine; Biotechnology; Computer Technologies and Information Sciences; Physics