Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

High-performance epistasis detection in quantitative trait GWAS

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [1];  [1];  [1];  [1];  [2];  [3];  [2]
  1. Iowa State Univ., Ames, IA (United States)
  2. Univ. of Maryland, College Park, MD (United States)
  3. Univ. of Arkansas, Fayetteville, AR (United States)
epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application on Edison (average 1.35× speedup) are demonstrated.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC); Univ. of California, Oakland, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1544015
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 3 Vol. 32; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English

References (11)

A global reference for human genetic variation journal January 2015
Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle journal July 2014
CChi: An efficient cloud epistasis test model in human genome wide association studies conference December 2013
An Efficient and Scalable Implementation of SNP-Pair Interaction Testing for Genetic Association Studies
  • Koesterke, Lars; Stanzione, Dan; Vaughn, Matt
  • Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum https://doi.org/10.1109/IPDPS.2011.190
conference May 2011
Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems journal September 2015
Fast Epistasis Detection in Large-Scale GWAS for Intel Xeon Phi Clusters conference August 2015
OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers
  • Fanfarillo, Alessandro; Burnus, Tobias; Cardellini, Valeria
  • Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14 https://doi.org/10.1145/2676870.2676876
conference January 2014
Large-scale genome-wide association studies on a GPU cluster using a CUDA-accelerated PGAS programming model
  • González-Domínguez, Jorge; Kässens, Jan Christian; Wienbrandt, Lars
  • The International Journal of High Performance Computing Applications, Vol. 29, Issue 4 https://doi.org/10.1177/1094342015585846
journal February 2015
Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies journal January 2008
High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies journal February 2015
Cost-effective GPU-Grid for Genome-wide Epistasis Calculations journal January 2013

Cited By (1)

A Large-Scale Genome-Wide Association Study in U.S. Holstein Cattle journal May 2019

Similar Records

LANL SDAV Visualization Update [Slides]
Technical Report · Sun Jun 15 00:00:00 EDT 2014 · OSTI ID:1134773

Optimizing legacy molecular dynamics software with directive-based offload
Journal Article · Wed May 13 20:00:00 EDT 2015 · Computer Physics Communications · OSTI ID:1261448

Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP
Journal Article · Tue May 31 20:00:00 EDT 2016 · Parallel Computing · OSTI ID:1332725