Parallel accelerated vector similarity calculations for genomics applications
Abstract
The surge in availability of genomic data holds promise for enabling determination of genetic causes of observed individual traits, with applications to problems such as discovery of the genetic roots of phenotypes, be they molecular phenotypes such as gene expression or metabolite concentrations, or complex phenotypes such as diseases. However, the growing sizes of these datasets and the quadratic, cubic or higher scaling characteristics of the relevant algorithms pose a serious computational challenge necessitating use of leadership scale computing. In this study we describe a new approach to performing vector similarity metrics calculations, suitable for parallel systems equipped with graphics processing units (GPUs) or Intel Xeon Phi processors. Our primary focus is the Proportional Similarity metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS). We describe the implementation of the algorithms on accelerated processors, methods used for eliminating redundant calculations due to symmetries, and techniques for efficient mapping of the calculations to many-node parallel systems. Finally, results are presented demonstrating high per-node performance and parallel scalability with rates of more than five quadrillion (5 × 1015) elementwise comparisons achieved per second on the ORNL Titan system. In a companion paper we describe corresponding techniques appliedmore »
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Tennessee, Knoxville, TN (United States). The Bredesen Center for Interdisciplinary Research and Graduate Education
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1474712
- Alternate Identifier(s):
- OSTI ID: 1548509
- Grant/Contract Number:
- AC05-00OR22725; PS02-06ER64304
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Parallel Computing
- Additional Journal Information:
- Journal Volume: 75; Journal ID: ISSN 0167-8191
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; 59 BASIC BIOLOGICAL SCIENCES; high performance computing; parallel algorithms; NVIDIA ® GPU; Intel ® Xeon Phi; comparative genomics; vector similarity metrics; Proportional Similarity metric
Citation Formats
Joubert, Wayne, Nance, James, Weighill, Deborah, and Jacobson, Daniel. Parallel accelerated vector similarity calculations for genomics applications. United States: N. p., 2018.
Web. doi:10.1016/j.parco.2018.03.009.
Joubert, Wayne, Nance, James, Weighill, Deborah, & Jacobson, Daniel. Parallel accelerated vector similarity calculations for genomics applications. United States. https://doi.org/10.1016/j.parco.2018.03.009
Joubert, Wayne, Nance, James, Weighill, Deborah, and Jacobson, Daniel. Tue .
"Parallel accelerated vector similarity calculations for genomics applications". United States. https://doi.org/10.1016/j.parco.2018.03.009. https://www.osti.gov/servlets/purl/1474712.
@article{osti_1474712,
title = {Parallel accelerated vector similarity calculations for genomics applications},
author = {Joubert, Wayne and Nance, James and Weighill, Deborah and Jacobson, Daniel},
abstractNote = {The surge in availability of genomic data holds promise for enabling determination of genetic causes of observed individual traits, with applications to problems such as discovery of the genetic roots of phenotypes, be they molecular phenotypes such as gene expression or metabolite concentrations, or complex phenotypes such as diseases. However, the growing sizes of these datasets and the quadratic, cubic or higher scaling characteristics of the relevant algorithms pose a serious computational challenge necessitating use of leadership scale computing. In this study we describe a new approach to performing vector similarity metrics calculations, suitable for parallel systems equipped with graphics processing units (GPUs) or Intel Xeon Phi processors. Our primary focus is the Proportional Similarity metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS). We describe the implementation of the algorithms on accelerated processors, methods used for eliminating redundant calculations due to symmetries, and techniques for efficient mapping of the calculations to many-node parallel systems. Finally, results are presented demonstrating high per-node performance and parallel scalability with rates of more than five quadrillion (5 × 1015) elementwise comparisons achieved per second on the ORNL Titan system. In a companion paper we describe corresponding techniques applied to calculations of the Custom Correlation Coefficient for comparative genomics applications.},
doi = {10.1016/j.parco.2018.03.009},
journal = {Parallel Computing},
number = ,
volume = 75,
place = {United States},
year = {Tue Mar 27 00:00:00 EDT 2018},
month = {Tue Mar 27 00:00:00 EDT 2018}
}
Web of Science
Figures / Tables:
Works referenced in this record:
Anatomy of High-Performance 2D Similarity Calculations
journal, August 2011
- Haque, Imran S.; Pande, Vijay S.; Walters, W. Patrick
- Journal of Chemical Information and Modeling, Vol. 51, Issue 9
Measures of similarity between distributions
journal, January 1986
- Vegelius, Jan; Janson, Svante; Johansson, Folke
- Quality and Quantity, Vol. 20, Issue 4
A Multi-Trait, Meta-analysis for Detecting Pleiotropic Polymorphisms for Stature, Fatness and Reproduction in Beef Cattle
journal, March 2014
- Bolormaa, Sunduimijid; Pryce, Jennie E.; Reverter, Antonio
- PLoS Genetics, Vol. 10, Issue 3
3-way Networks: Application of Hypergraphs for Modelling Increased Complexity in Comparative Genomics
journal, March 2015
- Weighill, Deborah A.; Jacobson, Daniel A.
- PLOS Computational Biology, Vol. 11, Issue 3
GBOOST: a GPU-based tool for detecting gene–gene interactions in genome–wide case control studies
journal, March 2011
- Yung, Ling Sing; Yang, Can; Wan, Xiang
- Bioinformatics, Vol. 27, Issue 9
Detecting epistasis in human complex traits
journal, September 2014
- Wei, Wen-Hua; Hemani, Gibran; Haley, Chris S.
- Nature Reviews Genetics, Vol. 15, Issue 11
Variance component model to account for sample structure in genome-wide association studies
journal, March 2010
- Kang, Hyun Min; Sul, Jae Hoon; Service, Susan K.
- Nature Genetics, Vol. 42, Issue 4
The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery
journal, May 2011
- Pendergrass, S. A.; Brown-Gentry, K.; Dudek, S. M.
- Genetic Epidemiology, Vol. 35, Issue 5
An Ordination of the Upland Forest Communities of Southern Wisconsin
journal, February 1957
- Bray, J. Roger; Curtis, J. T.
- Ecological Monographs, Vol. 27, Issue 4
A Custom Correlation Coefficient (CCC) Approach for Fast Identification of Multi-SNP Association Patterns in Genome-Wide SNPs Data: Fast Identification of Multi-SNP Patterns in GWAS
journal, August 2014
- Climer, Sharlee; Yang, Wei; de las Fuentes, Lisa
- Genetic Epidemiology, Vol. 38, Issue 7
Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures
journal, August 2016
- Gonzalez-Dominguez, Jorge; Ramos, Sabela; Tourino, Juan
- IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 8
GPU-accelerated exhaustive search for third-order epistatic interactions in case–control studies
journal, May 2015
- González-Domínguez, Jorge; Schmidt, Bertil
- Journal of Computational Science, Vol. 8
Pleiotropy in complex traits: challenges and strategies
journal, June 2013
- Solovieff, Nadia; Cotsapas, Chris; Lee, Phil H.
- Nature Reviews Genetics, Vol. 14, Issue 7
A set of level 3 basic linear algebra subprograms
journal, March 1990
- Dongarra, J. J.; Du Croz, Jeremy; Hammarling, Sven
- ACM Transactions on Mathematical Software, Vol. 16, Issue 1
eCEO: an efficient Cloud Epistasis cOmputing model in genome-wide association study
journal, March 2011
- Wang, Z.; Wang, Y.; Tan, K. -L.
- Bioinformatics, Vol. 27, Issue 8
Works referencing / citing this record:
Hardwood Tree Genomics: Unlocking Woody Plant Biology
journal, December 2018
- Tuskan, Gerald A.; Groover, Andrew T.; Schmutz, Jeremy
- Frontiers in Plant Science, Vol. 9
Hardwood Tree Genomics: Unlocking Woody Plant Biology
journal, December 2018
- Tuskan, Gerald A.; Groover, Andrew T.; Schmutz, Jeremy
- Frontiers in Plant Science, Vol. 9
High Throughput Screening Technologies in Biomass Characterization
journal, November 2018
- Decker, Stephen R.; Harman-Ware, Anne E.; Happs, Renee M.
- Frontiers in Energy Research, Vol. 6
Figures / Tables found in this record: