Accelerating list management for MPI.
The latency and throughput of MPI messages are critically important to a range of parallel scientific applications. In many modern networks, both of these performance characteristics are largely driven by the performance of a processor on the network interface. Because of the semantics of MPI, this embedded processor is forced to traverse a linked list of posted receives each time a message is received. As this list grows long, the latency of message reception grows and the throughput of MPI messages decreases. This paper presents a novel hardware feature to handle list management functions on a network interface. By moving functions such as list insertion, list traversal, and list deletion to the hardware unit, latencies are decreased by up to 20% in the zero length queue case with dramatic improvements in the presence of long queues. Similarly, the throughput is increased by up to 10% in the zero length queue case and by nearly 100% in the presence queues of 30 messages.
- Research Organization:
- Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 967795
- Report Number(s):
- SAND2005-4564C; TRN: US200924%%29
- Resource Relation:
- Conference: Proposed for presentation at the Cluster 2005 held September 27-30, 2005 held in Boston, MA.
- Country of Publication:
- United States
- Language:
- English
Similar Records
A dynamic, unified design for dedicated message matching engines for collective and point-to-point communications
Using Simulation to Examine the Effect of MPI Message Matching Costs on Application Performance