skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: On the Impact of Widening Vector Registers on Sequence Alignment

Abstract

Vector extensions, such as SSE, have been part of the x86 since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. In this paper, we demonstrate that the trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. We present a practically efficient SIMD implementation of a parallel scan based sequence alignment algorithm that can better exploit wider SIMD units. We conduct comprehensive workload and use case analyses to characterize the relative behavior of the striped and scan approaches and identify the best choice of algorithm based on input length and SIMD width.

Authors:
; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1340891
Report Number(s):
PNNL-SA-118504
KJ0402000
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: 45th International Conference on Parallel Processing (ICPP 2016), August 15-19, 2016, Philadelphia, Pennsylvania, 506 - 515
Country of Publication:
United States
Language:
English
Subject:
Smitb-Waterman; Needleman-Wunsch; sequence alignment; SIMD; parasail

Citation Formats

Daily, Jeffrey A., Kalyanaraman, Anantharaman, Krishnamoorthy, Sriram, and Ren, Bin. On the Impact of Widening Vector Registers on Sequence Alignment. United States: N. p., 2016. Web. doi:10.1109/ICPP.2016.65.
Daily, Jeffrey A., Kalyanaraman, Anantharaman, Krishnamoorthy, Sriram, & Ren, Bin. On the Impact of Widening Vector Registers on Sequence Alignment. United States. doi:10.1109/ICPP.2016.65.
Daily, Jeffrey A., Kalyanaraman, Anantharaman, Krishnamoorthy, Sriram, and Ren, Bin. 2016. "On the Impact of Widening Vector Registers on Sequence Alignment". United States. doi:10.1109/ICPP.2016.65.
@article{osti_1340891,
title = {On the Impact of Widening Vector Registers on Sequence Alignment},
author = {Daily, Jeffrey A. and Kalyanaraman, Anantharaman and Krishnamoorthy, Sriram and Ren, Bin},
abstractNote = {Vector extensions, such as SSE, have been part of the x86 since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. In this paper, we demonstrate that the trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. We present a practically efficient SIMD implementation of a parallel scan based sequence alignment algorithm that can better exploit wider SIMD units. We conduct comprehensive workload and use case analyses to characterize the relative behavior of the striped and scan approaches and identify the best choice of algorithm based on input length and SIMD width.},
doi = {10.1109/ICPP.2016.65},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 9
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • This patent describes a vector processor. It comprises: a plurality of vector register means, each being divided into a plurality of smaller register means which each have a plurality of outputs; a plurality of element processor means each connected to the plurality of outlets; and instruction processor means.
  • A fast algorithm for multiple sequence alignment based on new approaches of tree construction and sequence comparison is suggested. The authors developed a version of the pairwise sequence alignment algorithm,` which was based on analysis of DOT matrix Diagonal fragments (Df) followed by joining of significant Dfs in the final alignment. The algorithm maintains some methodological features of Needleman-Wunsch (NW) type algorithms and uses statistical estimations of similarity of various Dfs. The estimations were entered into a compact ``competition matrix`` (CM). Homology of sequence positions for multiple alignment was changed to homology of the corresponding rows in the aligned subsets.more » In addition, instead of one-iteration filling of the CM by Df information, a multi-iteration method was suggested. The authors assumed that the minimal length of Df used for each iteration must be selected so that the probability of occurrence of homologous subsequences of a given length, by chance, would be low. On the basis of these significant Dfs they reconstructed an initial rough alignment. The method of multiple alignment presented here also uses a new approach for tree reconstruction based on the analysis of the relatively conserved oligonucleotides in a given set. This approach has some advantages compared to traditional methods of phylogenetic tree reconstructions. The method was tested on 5S RNA sequences and its application for contigs joining was discussed.« less
  • The problems of finding an RNA conformation of minimum free energy, an optimal alignment between two molecular sequences, or regions of two protein sequences which are similar in 3-D structure have one thing in common. They can be tackled with similar mathematical methods. A new approach to finding all solutions that are somehow close to optimal will be discussed. When applied to global sequence alignments or global 3-D protein structure comparisons, this method can indicate the robustness of the best solution. Some areas may be well defined and others not. With a change in parameter settings, the suboptimal sequence alignmentmore » algorithm becomes an algorithm for finding all local alignments with a preassigned score or better. In the RNA folding problem, all foldings within a prescribed percent of the minimum energy can be found. This often yields a large number of totally different structures. Usually, the total number of alternate solutions is very large. A computer printout of all of them might run to several hundred pages. Instead, a two dimensional dot plot is introduced which allows all suboptimal solutions to be displayed on a single page.« less
  • Statistical algorithms have proven to be useful in computational molecular biology. Many statistical problems are most easily addressed by pretending that critical missing data are available. For some problems statistical inference in facilitated by creating a set of latent variables, none of whose variables are observed. A key observation is that conditional probabilities for the values of the missing data can be inferred by application of Bayes theorem to the observed data. The statistical framework described in this paper employs Boltzmann like models, permutated data likelihood, EM, and Gibbs sampler algorithms. This tutorial reviews the common statistical framework behind allmore » of these algorithms largely in tabular or graphical terms, illustrates its application, and describes the biological underpinnings of the models used.« less
  • Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modification of sequences proceeds through any of the operations of substitution, insertion or deletion (the latter two collectively termed indels). While this model has worked farily well, it has long been apparent that other mutational events occur. In this paper, we introduce a new model, the DSI model which includes another common mutational event, tandem duplication. Tandem duplication produces tandem repeats which are common in DNA, making up perhaps 10% of the human genome. They are responsiblemore » for some human diseases and may serve a multitude of functions in DNA regulation and evolution. Using the DSI model, we develop new exact and heuristic algorithms for comparing and aligning DNA sequences when they contain tandem repeats. 30 refs., 3 figs.« less