DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel accelerated vector similarity calculations for genomics applications

Abstract

The surge in availability of genomic data holds promise for enabling determination of genetic causes of observed individual traits, with applications to problems such as discovery of the genetic roots of phenotypes, be they molecular phenotypes such as gene expression or metabolite concentrations, or complex phenotypes such as diseases. However, the growing sizes of these datasets and the quadratic, cubic or higher scaling characteristics of the relevant algorithms pose a serious computational challenge necessitating use of leadership scale computing. In this study we describe a new approach to performing vector similarity metrics calculations, suitable for parallel systems equipped with graphics processing units (GPUs) or Intel Xeon Phi processors. Our primary focus is the Proportional Similarity metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS). We describe the implementation of the algorithms on accelerated processors, methods used for eliminating redundant calculations due to symmetries, and techniques for efficient mapping of the calculations to many-node parallel systems. Finally, results are presented demonstrating high per-node performance and parallel scalability with rates of more than five quadrillion (5 × 1015) elementwise comparisons achieved per second on the ORNL Titan system. In a companion paper we describe corresponding techniques appliedmore » to calculations of the Custom Correlation Coefficient for comparative genomics applications.« less

Authors:
ORCiD logo [1]; ORCiD logo [1];  [2];  [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Tennessee, Knoxville, TN (United States). The Bredesen Center for Interdisciplinary Research and Graduate Education
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1474712
Alternate Identifier(s):
OSTI ID: 1548509
Grant/Contract Number:  
AC05-00OR22725; PS02-06ER64304
Resource Type:
Accepted Manuscript
Journal Name:
Parallel Computing
Additional Journal Information:
Journal Volume: 75; Journal ID: ISSN 0167-8191
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 59 BASIC BIOLOGICAL SCIENCES; high performance computing; parallel algorithms; NVIDIA ® GPU; Intel ® Xeon Phi; comparative genomics; vector similarity metrics; Proportional Similarity metric

Citation Formats

Joubert, Wayne, Nance, James, Weighill, Deborah, and Jacobson, Daniel. Parallel accelerated vector similarity calculations for genomics applications. United States: N. p., 2018. Web. doi:10.1016/j.parco.2018.03.009.
Joubert, Wayne, Nance, James, Weighill, Deborah, & Jacobson, Daniel. Parallel accelerated vector similarity calculations for genomics applications. United States. https://doi.org/10.1016/j.parco.2018.03.009
Joubert, Wayne, Nance, James, Weighill, Deborah, and Jacobson, Daniel. Tue . "Parallel accelerated vector similarity calculations for genomics applications". United States. https://doi.org/10.1016/j.parco.2018.03.009. https://www.osti.gov/servlets/purl/1474712.
@article{osti_1474712,
title = {Parallel accelerated vector similarity calculations for genomics applications},
author = {Joubert, Wayne and Nance, James and Weighill, Deborah and Jacobson, Daniel},
abstractNote = {The surge in availability of genomic data holds promise for enabling determination of genetic causes of observed individual traits, with applications to problems such as discovery of the genetic roots of phenotypes, be they molecular phenotypes such as gene expression or metabolite concentrations, or complex phenotypes such as diseases. However, the growing sizes of these datasets and the quadratic, cubic or higher scaling characteristics of the relevant algorithms pose a serious computational challenge necessitating use of leadership scale computing. In this study we describe a new approach to performing vector similarity metrics calculations, suitable for parallel systems equipped with graphics processing units (GPUs) or Intel Xeon Phi processors. Our primary focus is the Proportional Similarity metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS). We describe the implementation of the algorithms on accelerated processors, methods used for eliminating redundant calculations due to symmetries, and techniques for efficient mapping of the calculations to many-node parallel systems. Finally, results are presented demonstrating high per-node performance and parallel scalability with rates of more than five quadrillion (5 × 1015) elementwise comparisons achieved per second on the ORNL Titan system. In a companion paper we describe corresponding techniques applied to calculations of the Custom Correlation Coefficient for comparative genomics applications.},
doi = {10.1016/j.parco.2018.03.009},
journal = {Parallel Computing},
number = ,
volume = 75,
place = {United States},
year = {Tue Mar 27 00:00:00 EDT 2018},
month = {Tue Mar 27 00:00:00 EDT 2018}
}

Journal Article:

Citation Metrics:
Cited by: 8 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: (a): Computational pattern of 2-way metric calculation; (b) Data decomposition for 2-way metric calculation

Save / Share:

Works referenced in this record:

Anatomy of High-Performance 2D Similarity Calculations
journal, August 2011

  • Haque, Imran S.; Pande, Vijay S.; Walters, W. Patrick
  • Journal of Chemical Information and Modeling, Vol. 51, Issue 9
  • DOI: 10.1021/ci200235e

Measures of similarity between distributions
journal, January 1986

  • Vegelius, Jan; Janson, Svante; Johansson, Folke
  • Quality and Quantity, Vol. 20, Issue 4
  • DOI: 10.1007/BF00123091

A Multi-Trait, Meta-analysis for Detecting Pleiotropic Polymorphisms for Stature, Fatness and Reproduction in Beef Cattle
journal, March 2014


3-way Networks: Application of Hypergraphs for Modelling Increased Complexity in Comparative Genomics
journal, March 2015


GBOOST: a GPU-based tool for detecting gene–gene interactions in genome–wide case control studies
journal, March 2011


Detecting epistasis in human complex traits
journal, September 2014

  • Wei, Wen-Hua; Hemani, Gibran; Haley, Chris S.
  • Nature Reviews Genetics, Vol. 15, Issue 11
  • DOI: 10.1038/nrg3747

Variance component model to account for sample structure in genome-wide association studies
journal, March 2010

  • Kang, Hyun Min; Sul, Jae Hoon; Service, Susan K.
  • Nature Genetics, Vol. 42, Issue 4
  • DOI: 10.1038/ng.548

The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery
journal, May 2011

  • Pendergrass, S. A.; Brown-Gentry, K.; Dudek, S. M.
  • Genetic Epidemiology, Vol. 35, Issue 5
  • DOI: 10.1002/gepi.20589

An Ordination of the Upland Forest Communities of Southern Wisconsin
journal, February 1957

  • Bray, J. Roger; Curtis, J. T.
  • Ecological Monographs, Vol. 27, Issue 4
  • DOI: 10.2307/1942268

Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures
journal, August 2016

  • Gonzalez-Dominguez, Jorge; Ramos, Sabela; Tourino, Juan
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 8
  • DOI: 10.1109/TPDS.2015.2460247

GPU-accelerated exhaustive search for third-order epistatic interactions in case–control studies
journal, May 2015


Pleiotropy in complex traits: challenges and strategies
journal, June 2013

  • Solovieff, Nadia; Cotsapas, Chris; Lee, Phil H.
  • Nature Reviews Genetics, Vol. 14, Issue 7
  • DOI: 10.1038/nrg3461

A set of level 3 basic linear algebra subprograms
journal, March 1990

  • Dongarra, J. J.; Du Croz, Jeremy; Hammarling, Sven
  • ACM Transactions on Mathematical Software, Vol. 16, Issue 1
  • DOI: 10.1145/77626.79170

eCEO: an efficient Cloud Epistasis cOmputing model in genome-wide association study
journal, March 2011


Works referencing / citing this record:

Hardwood Tree Genomics: Unlocking Woody Plant Biology
journal, December 2018

  • Tuskan, Gerald A.; Groover, Andrew T.; Schmutz, Jeremy
  • Frontiers in Plant Science, Vol. 9
  • DOI: 10.3389/fpls.2018.01799

Hardwood Tree Genomics: Unlocking Woody Plant Biology
journal, December 2018

  • Tuskan, Gerald A.; Groover, Andrew T.; Schmutz, Jeremy
  • Frontiers in Plant Science, Vol. 9
  • DOI: 10.3389/fpls.2018.01799

High Throughput Screening Technologies in Biomass Characterization
journal, November 2018

  • Decker, Stephen R.; Harman-Ware, Anne E.; Happs, Renee M.
  • Frontiers in Energy Research, Vol. 6
  • DOI: 10.3389/fenrg.2018.00120