skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

Abstract

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.

Authors:
 [1];  [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Center for Computational Sciences
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program
OSTI Identifier:
974630
DOE Contract Number:  
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 3rd International ICST Conference on Simulation Tools and Techniques, Torremolinos, Malaga, Spain, 20100315, 20100319
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; PARALLEL PROCESSING; COMPUTER CALCULATIONS; COMPUTERIZED SIMULATION; COMPUTER ARCHITECTURE; PERFORMANCE; SIMULATORS; DATA TRANSMISSION; MEMORY MANAGEMENT

Citation Formats

Aaby, Brandon G, Perumalla, Kalyan S, and Seal, Sudip K. Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors. United States: N. p., 2010. Web.
Aaby, Brandon G, Perumalla, Kalyan S, & Seal, Sudip K. Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors. United States.
Aaby, Brandon G, Perumalla, Kalyan S, and Seal, Sudip K. Fri . "Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors". United States.
@article{osti_974630,
title = {Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors},
author = {Aaby, Brandon G and Perumalla, Kalyan S and Seal, Sudip K},
abstractNote = {An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2010},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: