Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0

Summary: Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses:
indel-Seq-Gen Version 2.0
Cory L. Strope,* Kevin Abel,* Stephen D. Scott,* and Etsuko N. Moriyama à
*Department of Computer Science and Engineering, University of Nebraska; School of Biological Sciences, University of Nebraska;
and àCenter for Plant Science Innovation, University of Nebraska
Sequence simulation is an important tool in validating biological hypotheses as well as testing various bioinformatics and
molecular evolutionary methods. Hypothesis testing relies on the representational ability of the sequence simulation
method. Simple hypotheses are testable through simulation of random, homogeneously evolving sequence sets.
However, testing complex hypotheses, for example, local similarities, requires simulation of sequence evolution under
heterogeneous models. To this end, we previously introduced indel-Seq-Gen version 1.0 (iSGv1.0; indel, insertion/
deletion). iSGv1.0 allowed heterogeneous protein evolution and motif conservation as well as insertion and deletion
constraints in subsequences. Despite these advances, for complex hypothesis testing, neither iSGv1.0 nor other currently
available sequence simulation methods is sufficient. indel-Seq-Gen version 2.0 (iSGv2.0) aims at simulating evolution of
highly divergent DNA sequences and protein superfamilies. iSGv2.0 improves upon iSGv1.0 through the addition of
lineage-specific evolution, motif conservation using PROSITE-like regular expressions, indel tracking, subsequence-
length constraints, as well as coding and noncoding DNA evolution. Furthermore, we formalize the sequence
representation used for iSGv2.0 and uncover a flaw in the modeling of indels used in current state of the art methods,
which biases simulation results for hypotheses involving indels. We fix this flaw in iSGv2.0 by using a novel discrete
stepping procedure. Finally, we present an example simulation of the calycin-superfamily sequences and compare the
performance of iSGv2.0 with iSGv1.0 and random model of sequence evolution.


Source: Anisimova, Maria - Institute of Scientific Computing, Eidgenössische Technische Hochschule Zürich (ETHZ)


Collections: Biology and Medicine