A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes
- Univ. of Göttingen, Göttingen (Germany). Inst. of Microbiology and Genetics
- Univ. of Göttingen, Göttingen (Germany). Inst. of Microbiology and Genetics; Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Theoretical Division
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Theoretical Division
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Theoretical Division; Santa Fe Inst. (SFI), Santa Fe, NM (United States)
Background: Jumping alignments have recently been proposed as a strategy to search a given multiple sequence alignment A against a database. Instead of comparing a database sequence S to the multiple alignment or profile as a whole, S is compared and aligned to individual sequences from A. Within this alignment, S can jump between different sequences from A, so different parts of S can be aligned to different sequences from the input multiple alignment. This approach is particularly useful for dealing with recombination events. Results: We developed a jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach. Given a partition of the aligned input sequence family into known sequence subtypes, our model can jump between states corresponding to these different subtypes, depending on which subtype is locally most similar to a database sequence. Jumps between different subtypes are indicative of intersubtype recombinations. We applied our method to a large set of genome sequences from human immunodeficiency virus (HIV) and hepatitis C virus (HCV) as well as to simulated recombined genome sequences. Conclusion: Our results demonstrate that jumps in our jumping profile HMM often correspond to recombination breakpoints; our approach can therefore be used to detect recombinations in genomic sequences. The recombination breakpoints identified by jpHMM were found to be significantly more accurate than breakpoints defined by traditional methods based on comparing single representative sequences.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
- Grant/Contract Number:
- AC52-06NA25396
- OSTI ID:
- 1626321
- Journal Information:
- BMC Bioinformatics, Vol. 7, Issue 1; ISSN 1471-2105
- Publisher:
- BioMed CentralCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
jpHMM: Improving the reliability of recombination prediction in HIV-1
HIV classification using coalescent theory