Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors
- ORNL
An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.
- Research Organization:
- Oak Ridge National Laboratory (ORNL); Center for Computational Sciences
- Sponsoring Organization:
- ORNL LDRD Director's R&D
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 974630
- Country of Publication:
- United States
- Language:
- English
Similar Records
GMH: A Message Passing Toolkit for GPU Clusters
Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset