skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

Abstract

In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread level parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).

Authors:
; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1123253
Report Number(s):
PNNL-SA-93959
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Euro-Par 2013 Parallel Processing. 19th International Conference, August 26-30, 2013, Aachen, Germany. Lecture Notes in Computer Science, 8097:813-825
Country of Publication:
United States
Language:
English
Subject:
Linear solvers; LU decomposition; GPGPU

Citation Formats

Villa, Oreste, Fatica, Massimiliano, Gawande, Nitin A., and Tumeo, Antonino. Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs. United States: N. p., 2013. Web. doi:10.1007/978-3-642-40047-6_81.
Villa, Oreste, Fatica, Massimiliano, Gawande, Nitin A., & Tumeo, Antonino. Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs. United States. https://doi.org/10.1007/978-3-642-40047-6_81
Villa, Oreste, Fatica, Massimiliano, Gawande, Nitin A., and Tumeo, Antonino. 2013. "Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs". United States. https://doi.org/10.1007/978-3-642-40047-6_81.
@article{osti_1123253,
title = {Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs},
author = {Villa, Oreste and Fatica, Massimiliano and Gawande, Nitin A. and Tumeo, Antonino},
abstractNote = {In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread level parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).},
doi = {10.1007/978-3-642-40047-6_81},
url = {https://www.osti.gov/biblio/1123253}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Aug 26 00:00:00 EDT 2013},
month = {Mon Aug 26 00:00:00 EDT 2013}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: