Accelerating Subsurface Transport Simulation on Heterogeneous Clusters

Villa, Oreste; Gawande, Nitin A.; Tumeo, Antonino

doi:10.1109/CLUSTER.2013.6702656

Title: Accelerating Subsurface Transport Simulation on Heterogeneous Clusters

Conference · Mon Sep 23 00:00:00 EDT 2013

DOI:https://doi.org/10.1109/CLUSTER.2013.6702656· OSTI ID:1123252

Villa, Oreste; Gawande, Nitin A.; Tumeo, Antonino

Reactive transport numerical models simulate chemical and microbiological reactions that occur along a flowpath. These models have to compute reactions for a large number of locations. They solve the set of ordinary differential equations (ODEs) that describes the reaction for each location through the Newton-Raphson technique. This technique involves computing a Jacobian matrix and a residual vector for each set of equation, and then solving iteratively the linearized system by performing Gaussian Elimination and LU decomposition until convergence. STOMP, a well known subsurface flow simulation tool, employs matrices with sizes in the order of 100x100 elements and, for numerical accuracy, LU factorization with full pivoting instead of the faster partial pivoting. Modern high performance computing systems are heterogeneous machines whose nodes integrate both CPUs and GPUs, exposing unprecedented amounts of parallelism. To exploit all their computational power, applications must use both the types of processing elements. For the case of subsurface flow simulation, this mainly requires implementing efficient batched LU-based solvers and identifying efficient solutions for enabling load balancing among the different processors of the system. In this paper we discuss two approaches that allows scaling STOMP's performance on heterogeneous clusters. We initially identify the challenges in implementing batched LU-based solvers for small matrices on GPUs, and propose an implementation that fulfills STOMP's requirements. We compare this implementation to other existing solutions. Then, we combine the batched GPU solver with an OpenMP-based CPU solver, and present an adaptive load balancer that dynamically distributes the linear systems to solve between the two components inside a node. We show how these approaches, integrated into the full application, provide speed ups from 6 to 7 times on large problems, executed on up to 16 nodes of a cluster with two AMD Opteron 6272 and a Tesla M2090 per node.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1123252

Report Number(s):: PNNL-SA-96124

Resource Relation:: Conference: IEEE International Conference on Cluster Computing (CLUSTER 2013), September 23-27, 2013, Indianapolis, Indiana, 1-8

Country of Publication:: United States

Language:: English

Similar Records

A Flexible CUDA LU-based Solver for Small, Batched Linear Systems

Book · Mon Jun 09 00:00:00 EDT 2014 · OSTI ID:1123252

Tumeo, Antonino; Gawande, Nitin A.; Villa, Oreste

Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

Conference · Mon Aug 26 00:00:00 EDT 2013 · OSTI ID:1123252

Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.; +1 more

Batched matrix computations on hardware accelerators based on GPUs

Journal Article · Mon Feb 09 00:00:00 EST 2015 · International Journal of High Performance Computing Applications · OSTI ID:1123252

Haidar, Azzam; Dong, Tingxing; Luszczek, Piotr; +2 more

Related Subjects

GPU
STOMP
LOAD BALANCING
HETEROGENEOUS CLUSTERS

Title: Accelerating Subsurface Transport Simulation on Heterogeneous Clusters

Citation Formats

Similar Records

Related Subjects