Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy in reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1023734
- Report Number(s):
- PNNL-SA-76834; 400470000; TRN: US201120%%1070
- Resource Relation:
- Conference: 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011), May 23-26, 2011, Newport Beach, California, 275-284
- Country of Publication:
- United States
- Language:
- English
Similar Records
Implementing and Evaluating Multithreaded Triad Census Algorithms on the Cray XMT
LDRD final report : massive multithreading applied to national infrastructure and informatics.