miniGMG is a compact geometric multigrid (MG benchmark designed to proxy the performance characteristics of the solves found in adaptive mesh refinement multigrid (AMR MG) applications. It solves the equation a*alpha*u - b div beta grad u = f on a large cubical domain using cell-centered values and periodic boundaries. a and b are scalar constants, alpha and beta are space varying constants, f is the right hand side, and u is the solution. The cubical domain is divided into cubical subdomains which are distributed across the supercomputer. The righthand side f is generated from a continuous function for u. Thus, one can solve this linear system of equations and verify the error properties of the discrete solution u by comparing to a sampling of the continuous u. miniGMG implements both v- and f- cycles, as well as two constant coeffficient discretications of the equation above (one 27pt 2nd order, and one 13pt 4th order). In both cases, the (embedded) v-cycles are truncated when a subdomain reaces 4^3, 2^3, or 1^3 cells (a u-cycle). The multigrd solver then swtiches to one of seven coarse grid (bottom) solves. relaxation, BiCGStab, CABiCGStab, TelescopingCABiCGStab, CG, or CACG. The code exploits two forms of communication-avoiding. The first is exemplified by the referenced SC'12 paper in which DRAM data movment is avoided when smoothing. The second, described in the IPDPS'14 paper avoids collectives in a bottom solve (the CA bottom solves). There are a number of implementations of the operators for both CPUs and NVIDIA GPUs. The CPUs have been optimized to exploit a wavefront approach to communication-avoiding. The GPU versions include a number of optimized and tunable implementations. The code has been demonstrated to scale to 46K nodes on the BGQ Mira (750K cores), 9K nodes on Edison (111K cores), and 14K GPUs on Titan (37M cuda cores). It thus acts as a scalable testbed for a wide range of computer science and applied math research. Nominally, one invokes the benchmark as ./run b x y z alpha beta gamma Each subdomain is (2^b)^3 cells Each MPI process receives a x-by-y-by-z collection of subdomains. There are alpha*beta*gamma total MPI processes For correctness (a cubical domain), *alpha==y*beta==z*gamma.
To order this software or receive further information, please fill out the following request: Request Software
@misc{osti_1231788,
title = {miniGMG, Version 00},
author = {Williams, Sameul},
abstractNote = {miniGMG is a compact geometric multigrid (MG benchmark designed to proxy the performance characteristics of the solves found in adaptive mesh refinement multigrid (AMR MG) applications. It solves the equation a*alpha*u - b div beta grad u = f on a large cubical domain using cell-centered values and periodic boundaries. a and b are scalar constants, alpha and beta are space varying constants, f is the right hand side, and u is the solution. The cubical domain is divided into cubical subdomains which are distributed across the supercomputer. The righthand side f is generated from a continuous function for u. Thus, one can solve this linear system of equations and verify the error properties of the discrete solution u by comparing to a sampling of the continuous u. miniGMG implements both v- and f- cycles, as well as two constant coeffficient discretications of the equation above (one 27pt 2nd order, and one 13pt 4th order). In both cases, the (embedded) v-cycles are truncated when a subdomain reaces 4^3, 2^3, or 1^3 cells (a u-cycle). The multigrd solver then swtiches to one of seven coarse grid (bottom) solves. relaxation, BiCGStab, CABiCGStab, TelescopingCABiCGStab, CG, or CACG. The code exploits two forms of communication-avoiding. The first is exemplified by the referenced SC'12 paper in which DRAM data movment is avoided when smoothing. The second, described in the IPDPS'14 paper avoids collectives in a bottom solve (the CA bottom solves). There are a number of implementations of the operators for both CPUs and NVIDIA GPUs. The CPUs have been optimized to exploit a wavefront approach to communication-avoiding. The GPU versions include a number of optimized and tunable implementations. The code has been demonstrated to scale to 46K nodes on the BGQ Mira (750K cores), 9K nodes on Edison (111K cores), and 14K GPUs on Titan (37M cuda cores). It thus acts as a scalable testbed for a wide range of computer science and applied math research. Nominally, one invokes the benchmark as ./run b x y z alpha beta gamma Each subdomain is (2^b)^3 cells Each MPI process receives a x-by-y-by-z collection of subdomains. There are alpha*beta*gamma total MPI processes For correctness (a cubical domain), *alpha==y*beta==z*gamma.},
doi = {},
url = {https://www.osti.gov/biblio/1231788},
year = {Sun Dec 01 00:00:00 EST 2013},
month = {Sun Dec 01 00:00:00 EST 2013},
note =
}