skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High-performance epistasis detection in quantitative trait GWAS

Abstract

epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application onmore » Edison (average 1.35× speedup) are demonstrated.« less

Authors:
 [1];  [1];  [1];  [1];  [1];  [2];  [3];  [2]
  1. Iowa State Univ., Ames, IA (United States)
  2. Univ. of Maryland, College Park, MD (United States)
  3. Univ. of Arkansas, Fayetteville, AR (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC); Univ. of California, Oakland, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1544015
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 32; Journal Issue: 3; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Computer Science

Citation Formats

Weeks, Nathan T., Luecke, Glenn R., Groth, Brandon M., Kraeva, Marina, Ma, Li, Kramer, Luke M., Koltes, James E., and Reecy, James M.. High-performance epistasis detection in quantitative trait GWAS. United States: N. p., 2016. Web. doi:10.1177/1094342016658110.
Weeks, Nathan T., Luecke, Glenn R., Groth, Brandon M., Kraeva, Marina, Ma, Li, Kramer, Luke M., Koltes, James E., & Reecy, James M.. High-performance epistasis detection in quantitative trait GWAS. United States. https://doi.org/10.1177/1094342016658110
Weeks, Nathan T., Luecke, Glenn R., Groth, Brandon M., Kraeva, Marina, Ma, Li, Kramer, Luke M., Koltes, James E., and Reecy, James M.. 2016. "High-performance epistasis detection in quantitative trait GWAS". United States. https://doi.org/10.1177/1094342016658110. https://www.osti.gov/servlets/purl/1544015.
@article{osti_1544015,
title = {High-performance epistasis detection in quantitative trait GWAS},
author = {Weeks, Nathan T. and Luecke, Glenn R. and Groth, Brandon M. and Kraeva, Marina and Ma, Li and Kramer, Luke M. and Koltes, James E. and Reecy, James M.},
abstractNote = {epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application on Edison (average 1.35× speedup) are demonstrated.},
doi = {10.1177/1094342016658110},
url = {https://www.osti.gov/biblio/1544015}, journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = 3,
volume = 32,
place = {United States},
year = {2016},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 6 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Large-scale genome-wide association studies on a GPU cluster using a CUDA-accelerated PGAS programming model
journal, February 2015

  • González-Domínguez, Jorge; Kässens, Jan Christian; Wienbrandt, Lars
  • The International Journal of High Performance Computing Applications, Vol. 29, Issue 4
  • https://doi.org/10.1177/1094342015585846

Fast Epistasis Detection in Large-Scale GWAS for Intel Xeon Phi Clusters
conference, August 2015


OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers
conference, January 2014

  • Fanfarillo, Alessandro; Burnus, Tobias; Cardellini, Valeria
  • Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14
  • https://doi.org/10.1145/2676870.2676876

CChi: An efficient cloud epistasis test model in human genome wide association studies
conference, December 2013


Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle
journal, July 2014


Cost-effective GPU-Grid for Genome-wide Epistasis Calculations
journal, January 2013


An Efficient and Scalable Implementation of SNP-Pair Interaction Testing for Genetic Association Studies
conference, May 2011

  • Koesterke, Lars; Stanzione, Dan; Vaughn, Matt
  • Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
  • https://doi.org/10.1109/IPDPS.2011.190

Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems
journal, September 2015


High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies
journal, February 2015