Final Report for Project DE-FC02-06ER25755 [Pmodels2]

Panda, Dhabaleswar; Sadayappan, P.

doi:10.2172/1122948

Title: Final Report for Project DE-FC02-06ER25755 [Pmodels2]

Technical Report · Wed Mar 12 00:00:00 EDT 2014

DOI:https://doi.org/10.2172/1122948· OSTI ID:1122948

Panda, Dhabaleswar ^[1]; Sadayappan, P. ^[1]

The Ohio State Univ., Columbus, OH (United States)

In this report, we describe the research accomplished by the OSU team under the Pmodels2 project. The team has worked on various angles: designing high performance MPI implementations on modern networking technologies (Mellanox InfiniBand (including the new ConnectX2 architecture and Quad Data Rate), QLogic InfiniPath, the emerging 10GigE/iWARP and RDMA over Converged Enhanced Ethernet (RoCE) and Obsidian IB-WAN), studying MPI scalability issues for multi-thousand node clusters using XRC transport, scalable job start-up, dynamic process management support, efficient one-sided communication, protocol offloading and designing scalable collective communication libraries for emerging multi-core architectures. New designs conforming to the Argonne’s Nemesis interface have also been carried out. All of these above solutions have been integrated into the open-source MVAPICH/MVAPICH2 software. This software is currently being used by more than 2,100 organizations worldwide (in 71 countries). As of January ’14, more than 200,000 downloads have taken place from the OSU Web site. In addition, many InfiniBand vendors, server vendors, system integrators and Linux distributors have been incorporating MVAPICH/MVAPICH2 into their software stacks and distributing it. Several InfiniBand systems using MVAPICH/MVAPICH2 have obtained positions in the TOP500 ranking of supercomputers in the world. The latest November ’13 ranking include the following systems: 7th ranked Stampede system at TACC with 462,462 cores; 11th ranked Tsubame 2.5 system at Tokyo Institute of Technology with 74,358 cores; 16th ranked Pleiades system at NASA with 81,920 cores; Work on PGAS models has proceeded on multiple directions. The Scioto framework, which supports task-parallelism in one-sided and global-view parallel programming, has been extended to allow multi-processor tasks that are executed by processor groups. A quantum Monte Carlo application is being ported onto the extended Scioto framework. A public release of Global Trees (GT) has been made, along with the Global Chunks (GC) framework on which GT is built. The Global Chunks (GC) layer is also being used as the basis for the development of a higher level Global Graphs (GG) layer. The Global Graphs (GG) system will provide a global address space view of distributed graph data structures on distributed memory systems.

View Technical Report

Cite

Export

Save

Research Organization:: The Ohio State Univ., Columbus, OH (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

DOE Contract Number:: FC02-06ER25755

OSTI ID:: 1122948

Report Number(s):: DOE-OSU-25755-Final

Country of Publication:: United States

Language:: English

Similar Records

Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM

Conference · Wed Aug 05 00:00:00 EDT 2015 · OSTI ID:1122948

Lin, Jian; Hamidouche, Khaled; Zheng, Jie; +3 more

Optimizing Blocking and Nonblocking Reduction Operations for Multicore Systems: Hierarchical Design and Implementation

Conference · Tue Jan 01 00:00:00 EST 2013 · OSTI ID:1122948

Gorentla Venkata, Manjunath; Shamis, Pavel; Graham, Richard L; +2 more

Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation

Conference · Sun Sep 01 00:00:00 EDT 2013 · 2013 IEEE International Conference on Cluster Computing (CLUSTER) · OSTI ID:1122948

Venkata, Manjunath Gorentla; Shamis, Pavel; Sampath, Rahul; +2 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Final Report for Project DE-FC02-06ER25755 [Pmodels2]

Citation Formats

Similar Records

Related Subjects