On the Impact of Widening Vector Registers on Sequence Alignment
Abstract
Vector extensions, such as SSE, have been part of the x86 since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. In this paper, we demonstrate that the trend of widening vector registers adversely affects the stateoftheart sequence alignment algorithm based on striped data layouts. We present a practically efficient SIMD implementation of a parallel scan based sequence alignment algorithm that can better exploit wider SIMD units. We conduct comprehensive workload and use case analyses to characterize the relative behavior of the striped and scan approaches and identify the best choice of algorithm based on input length and SIMD width.
 Authors:
 Publication Date:
 Research Org.:
 Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
 Sponsoring Org.:
 USDOE
 OSTI Identifier:
 1340891
 Report Number(s):
 PNNLSA118504
KJ0402000
 DOE Contract Number:
 AC0576RL01830
 Resource Type:
 Conference
 Resource Relation:
 Conference: 45th International Conference on Parallel Processing (ICPP 2016), August 1519, 2016, Philadelphia, Pennsylvania, 506  515
 Country of Publication:
 United States
 Language:
 English
 Subject:
 SmitbWaterman; NeedlemanWunsch; sequence alignment; SIMD; parasail
Citation Formats
Daily, Jeffrey A., Kalyanaraman, Anantharaman, Krishnamoorthy, Sriram, and Ren, Bin. On the Impact of Widening Vector Registers on Sequence Alignment. United States: N. p., 2016.
Web. doi:10.1109/ICPP.2016.65.
Daily, Jeffrey A., Kalyanaraman, Anantharaman, Krishnamoorthy, Sriram, & Ren, Bin. On the Impact of Widening Vector Registers on Sequence Alignment. United States. doi:10.1109/ICPP.2016.65.
Daily, Jeffrey A., Kalyanaraman, Anantharaman, Krishnamoorthy, Sriram, and Ren, Bin. 2016.
"On the Impact of Widening Vector Registers on Sequence Alignment". United States.
doi:10.1109/ICPP.2016.65.
@article{osti_1340891,
title = {On the Impact of Widening Vector Registers on Sequence Alignment},
author = {Daily, Jeffrey A. and Kalyanaraman, Anantharaman and Krishnamoorthy, Sriram and Ren, Bin},
abstractNote = {Vector extensions, such as SSE, have been part of the x86 since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. In this paper, we demonstrate that the trend of widening vector registers adversely affects the stateoftheart sequence alignment algorithm based on striped data layouts. We present a practically efficient SIMD implementation of a parallel scan based sequence alignment algorithm that can better exploit wider SIMD units. We conduct comprehensive workload and use case analyses to characterize the relative behavior of the striped and scan approaches and identify the best choice of algorithm based on input length and SIMD width.},
doi = {10.1109/ICPP.2016.65},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 9
}

This patent describes a vector processor. It comprises: a plurality of vector register means, each being divided into a plurality of smaller register means which each have a plurality of outputs; a plurality of element processor means each connected to the plurality of outlets; and instruction processor means.

Multiple sequence alignment based on new approaches of tree construction and sequence comparison
A fast algorithm for multiple sequence alignment based on new approaches of tree construction and sequence comparison is suggested. The authors developed a version of the pairwise sequence alignment algorithm,` which was based on analysis of DOT matrix Diagonal fragments (Df) followed by joining of significant Dfs in the final alignment. The algorithm maintains some methodological features of NeedlemanWunsch (NW) type algorithms and uses statistical estimations of similarity of various Dfs. The estimations were entered into a compact ``competition matrix`` (CM). Homology of sequence positions for multiple alignment was changed to homology of the corresponding rows in the aligned subsets.more » 
Suboptimal solutions to problems of RNA secondary structure prediction, molecular sequence alignment and 3D comparisons of protein structure
The problems of finding an RNA conformation of minimum free energy, an optimal alignment between two molecular sequences, or regions of two protein sequences which are similar in 3D structure have one thing in common. They can be tackled with similar mathematical methods. A new approach to finding all solutions that are somehow close to optimal will be discussed. When applied to global sequence alignments or global 3D protein structure comparisons, this method can indicate the robustness of the best solution. Some areas may be well defined and others not. With a change in parameter settings, the suboptimal sequence alignmentmore » 
Foundations of statistical methods for multiple sequence alignment and structure prediction
Statistical algorithms have proven to be useful in computational molecular biology. Many statistical problems are most easily addressed by pretending that critical missing data are available. For some problems statistical inference in facilitated by creating a set of latent variables, none of whose variables are observed. A key observation is that conditional probabilities for the values of the missing data can be inferred by application of Bayes theorem to the observed data. The statistical framework described in this paper employs Boltzmann like models, permutated data likelihood, EM, and Gibbs sampler algorithms. This tutorial reviews the common statistical framework behind allmore » 
Sequence alignment with tandem duplication
Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modification of sequences proceeds through any of the operations of substitution, insertion or deletion (the latter two collectively termed indels). While this model has worked farily well, it has long been apparent that other mutational events occur. In this paper, we introduce a new model, the DSI model which includes another common mutational event, tandem duplication. Tandem duplication produces tandem repeats which are common in DNA, making up perhaps 10% of the human genome. They are responsiblemore »