How to program 122,400 heterogeneous cores and retain your sanity
- Los Alamos National Laboratory
Current technology trends favor hybrid architectures, typically with each node in a cluster containing both general-purpose and specialized 'accelerator' processors. The typical model for programming such systems is host-centric: The general-purpose processor orchestrates the computation, offloading performance-critical work to the accelerator, and data is communicated only among general-purpose processors. In this talk we propose a radically different hybrid-programming approach, which we call the 'reverse-acceleration model'. In this model the accelerators orchestrate the computation, offloading unacceleratable work to the general-purpose processors. Data is communicated among accelerators, not among general-purpose processors. We present the Cell Messaging Layer (CML), an implementation of the reverse-acceleration model for Los Alamos National Laboratory's Roadrunner supercomputer, a complex conglomerate of 122,400 processor cores of various types, multiple memory domains, and multiple network types, all with radically different performance characteristics but which together make Roadrunner the world's second-fastest supercomputer. CML demonstrates a new messaging-layer implementation technique called 'receiver-initiated message passing', which reduces communication latency by up to a third. Our thesis is that the reverse-acceleration model simplifies porting codes to heterogeneous systems and facilitates performance optimization. We present a case study of a legacy neutron-transport code that we modified to use reverse acceleration. Performance results from running this code across the full Roadrunner system indicate a substantial performance improvement over the unaccelerated version of the code.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC52-06NA25396
- OSTI ID:
- 1011083
- Report Number(s):
- LA-UR-10-02903; LA-UR-10-2903; TRN: US1102110
- Country of Publication:
- United States
- Language:
- English
Similar Records
Roadrunner Supercomputer Breaks the Petaflop Barrier
Multi-node and Multi-core Performance Studies of a Monte Carlo code RMC