skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: QMP-MVIA: a message passing system for Linux clusters with gigabit Ethernet mesh connections

Conference ·

Recent progress in performance coupled with a decline in price for copper-based gigabit Ethernet (GigE) interconnects makes them an attractive alternative to expensive high speed network interconnects (NIC) when constructing Linux clusters. However traditional message passing systems based on TCP for GigE interconnects cannot fully utilize the raw performance of today's GigE interconnects due to the overhead of kernel involvement and multiple memory copies during sending and receiving messages. The overhead is more evident in the case of mesh connected Linux clusters using multiple GigE interconnects in a single host. We present a general message passing system called QMP-MVIA (QCD Message Passing over M-VIA) for Linux clusters with mesh connections using GigE interconnects. In particular, we evaluate and compare the performance characteristics of TCP and M-VIA (an implementation of the VIA specification) software for a mesh communication architecture to demonstrate the feasibility of using M-VIA as a point-to-point communication software, on which QMP-MVIA is based. Furthermore, we illustrate the design and implementation of QMP-MVIA for mesh connected Linux clusters with emphasis on both point-to-point and collective communications, and demonstrate that QMP-MVIA message passing system using GigE interconnects achieves bandwidth and latency that are not only better than systems based on TCP but also compare favorably to systems using some of the specialized high speed interconnects in a switched architecture at much lower cost.

Research Organization:
Thomas Jefferson National Accelerator Facility (TJNAF), Newport News, VA (United States)
Sponsoring Organization:
USDOE Office of Energy Research (ER) (US)
DOE Contract Number:
AC05-84ER40150
OSTI ID:
840532
Report Number(s):
JLAB-CIO-04-02; DOE/ER/40150-3424; TRN: US200511%%109
Resource Relation:
Conference: 2004 IEEE International Conference on Cluster Computing, San Diego, CA (US), 09/20/2004--09/23/2004; Other Information: PBD: 1 Sep 2004
Country of Publication:
United States
Language:
English