Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Dynamic load balancing of matrix-vector multiplications on roadrunner compute nodes

Conference ·
OSTI ID:956401

Hybrid architectures that combine general purpose processors with accelerators are being adopted in several large-scale systems such as the petaflop Roadrunner supercomputer at Los Alamos. In this system, dual-core Opteron host processors are tightly coupled with PowerXCell 8i processors within each compute node. In this kind of hybrid architecture, an accelerated mode of operation is typically used to offload performance hotspots in the computation to the accelerators. In this paper we explore the suitability of a variant of this acceleration mode in which the performance hotspots are actually shared between the host and the accelerators. To achieve this we have designed a new load balancing algorithm, which is optimized for the Roadrunner compute nodes, to dynamically distribute computation and associated data between the host and the accelerators at runtime. Results are presented using this approach for sparse and dense matrix-vector multiplications that show load-balancing can improve performance by up to 24% over solely using the accelerators.

Research Organization:
Los Alamos National Laboratory (LANL)
Sponsoring Organization:
DOE
DOE Contract Number:
AC52-06NA25396
OSTI ID:
956401
Report Number(s):
LA-UR-09-00812; LA-UR-09-812
Country of Publication:
United States
Language:
English