High-performance epistasis detection in quantitative trait GWAS
Abstract
epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application onmore »
- Authors:
-
- Iowa State Univ., Ames, IA (United States)
- Univ. of Maryland, College Park, MD (United States)
- Univ. of Arkansas, Fayetteville, AR (United States)
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC); Univ. of California, Oakland, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1544015
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- International Journal of High Performance Computing Applications
- Additional Journal Information:
- Journal Volume: 32; Journal Issue: 3; Journal ID: ISSN 1094-3420
- Publisher:
- SAGE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Computer Science
Citation Formats
Weeks, Nathan T., Luecke, Glenn R., Groth, Brandon M., Kraeva, Marina, Ma, Li, Kramer, Luke M., Koltes, James E., and Reecy, James M.. High-performance epistasis detection in quantitative trait GWAS. United States: N. p., 2016.
Web. doi:10.1177/1094342016658110.
Weeks, Nathan T., Luecke, Glenn R., Groth, Brandon M., Kraeva, Marina, Ma, Li, Kramer, Luke M., Koltes, James E., & Reecy, James M.. High-performance epistasis detection in quantitative trait GWAS. United States. https://doi.org/10.1177/1094342016658110
Weeks, Nathan T., Luecke, Glenn R., Groth, Brandon M., Kraeva, Marina, Ma, Li, Kramer, Luke M., Koltes, James E., and Reecy, James M.. 2016.
"High-performance epistasis detection in quantitative trait GWAS". United States. https://doi.org/10.1177/1094342016658110. https://www.osti.gov/servlets/purl/1544015.
@article{osti_1544015,
title = {High-performance epistasis detection in quantitative trait GWAS},
author = {Weeks, Nathan T. and Luecke, Glenn R. and Groth, Brandon M. and Kraeva, Marina and Ma, Li and Kramer, Luke M. and Koltes, James E. and Reecy, James M.},
abstractNote = {epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application on Edison (average 1.35× speedup) are demonstrated.},
doi = {10.1177/1094342016658110},
url = {https://www.osti.gov/biblio/1544015},
journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = 3,
volume = 32,
place = {United States},
year = {2016},
month = {7}
}
Web of Science
Works referenced in this record:
Large-scale genome-wide association studies on a GPU cluster using a CUDA-accelerated PGAS programming model
journal, February 2015
- González-Domínguez, Jorge; Kässens, Jan Christian; Wienbrandt, Lars
- The International Journal of High Performance Computing Applications, Vol. 29, Issue 4
Fast Epistasis Detection in Large-Scale GWAS for Intel Xeon Phi Clusters
conference, August 2015
- Luecke, Glenn R.; Weeks, Nathan T.; Groth, Brandon M.
- 2015 IEEE Trustcom/BigDataSE/ISPA
OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers
conference, January 2014
- Fanfarillo, Alessandro; Burnus, Tobias; Cardellini, Valeria
- Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14
CChi: An efficient cloud epistasis test model in human genome wide association studies
conference, December 2013
- Zhou, Zhihui; Liu, Guixia; Su, Lingtao
- 2013 6th International Conference on Biomedical Engineering and Informatics (BMEI)
Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle
journal, July 2014
- Daetwyler, Hans D.; Capitan, Aurélien; Pausch, Hubert
- Nature Genetics, Vol. 46, Issue 8
Cost-effective GPU-Grid for Genome-wide Epistasis Calculations
journal, January 2013
- Pütz, B.; Kam-Thong, T.; Karbalai, N.
- Methods of Information in Medicine, Vol. 52, Issue 01
An Efficient and Scalable Implementation of SNP-Pair Interaction Testing for Genetic Association Studies
conference, May 2011
- Koesterke, Lars; Stanzione, Dan; Vaughn, Matt
- Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems
journal, September 2015
- Gonzalez-Dominguez, Jorge; Wienbrandt, Lars; Kassens, Jan Christian
- IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 12, Issue 5
Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies
journal, January 2008
- Ma, Li; Runesha, H. Birali; Dvorkin, Daniel
- BMC Bioinformatics, Vol. 9, Issue 1
High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies
journal, February 2015
- Goudey, Benjamin; Abedini, Mani; Hopper, John L.
- Health Information Science and Systems, Vol. 3, Issue S1