skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: THE QUADRICS NETWORK (QsNeT): HIGH-PERFORMANCE CLUSTERING TECHNOLOGY

Abstract

No abstract prepared.

Authors:
; ;
Publication Date:
Research Org.:
Los Alamos National Lab., NM (US)
Sponsoring Org.:
US Department of Energy (US)
OSTI Identifier:
783428
Report Number(s):
LA-UR-01-4100
TRN: AH200137%%7
DOE Contract Number:
W-7405-ENG-36
Resource Type:
Conference
Resource Relation:
Conference: Conference title not supplied, Conference location not supplied, Conference dates not supplied; Other Information: PBD: 1 Jul 2001
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; Q CODES; COMPUTER NETWORKS; CLUSTER EXPANSION; PERFORMANCE

Citation Formats

F. PETRINI, W. FENG, and ET AL. THE QUADRICS NETWORK (QsNeT): HIGH-PERFORMANCE CLUSTERING TECHNOLOGY. United States: N. p., 2001. Web.
F. PETRINI, W. FENG, & ET AL. THE QUADRICS NETWORK (QsNeT): HIGH-PERFORMANCE CLUSTERING TECHNOLOGY. United States.
F. PETRINI, W. FENG, and ET AL. Sun . "THE QUADRICS NETWORK (QsNeT): HIGH-PERFORMANCE CLUSTERING TECHNOLOGY". United States. doi:. https://www.osti.gov/servlets/purl/783428.
@article{osti_783428,
title = {THE QUADRICS NETWORK (QsNeT): HIGH-PERFORMANCE CLUSTERING TECHNOLOGY},
author = {F. PETRINI and W. FENG and ET AL},
abstractNote = {No abstract prepared.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Jul 01 00:00:00 EDT 2001},
month = {Sun Jul 01 00:00:00 EDT 2001}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • No abstract prepared.
  • This paper describes and evaluates protocols for optimizing strided non-contiguous communication on the Quadrics QsNetII high-performance network interconnect. Most of previous related studies focused primarily on NIC-based or host-based protocols. This paper discusses merits for using both approaches and tries to determine for types and data sizes in the communication operations these protocols should be used. We focus on the Quadrics QsNetII-II network which offers powerful communication processors on the network interface card (NIC) and practical and flexible opportunities for exploiting them in context of user. Furthermore, the paper focuses on non-contiguous data remote memory access (RMA) transfers and performsmore » the evaluation in context of standalone communication and application microbenchmarks. In comparison to the vendor provided noncontiguous interfaces, proposed approach achieved very significant performance improvement in context of microbenchmarks as well as application kernels- dense matrix multiplication and the Co-Array Fortran version of the NAS BT parallel benchmark. For example, for NAS BT Class B 54 % improvement in overall communication time and a 42% improvement in matrix multiplication was achieved for 64 processes.« less
  • The efficient implementation of collective communication patterns in a parallel machine is a challenging design effort, that requires the solution of many problems. In this paper we present an in-depth description of how the Quadrics network supports both hardware- and software-based collectives. We describe the main features of the two building blocks of this network, a network interface that can perform zero-copy user-level communication and a wormhole switch. We also focus our attention on the routing and $ow control algorithms, deadlock avoidance and on how the processing nodes are integrated in a global, virtual shared memory. Experimental results conducted onmore » 64-node AlphaServer cluster indicate that the time to complete the hardware-based barrier synchronization on the whole network is as low as 6 ps, with veiy good scalability. Good latency and scalability are also achieved with the software-based synchronization, which takes about 15 ps. With the broadcast, similar performance is achieved by the hardware- and software-based implementations, which can deliver messages of up to 256 b,ytes in 13 ps and can get a sustained bandwidth of 288 Mbyteshec on all the nodes, with wressages larger than 64KB. The hardware-based barrier is almost insensitive to the network congestion, with 93% of the synchronizations taking less than 20 ps. On the other hand, the software based implementation suflers from a signif cant performance degradation. In high load environments the hardware broadcast maintains a reasonably good performance, delivering messages up to 2KB in 200 ps, while the software broadcast suffers from slightly higher latencies inherited by the synchronization mechanism.« less
  • In prior work (Yoginath and Perumalla, 2011; Yoginath, Perumalla and Henz, 2012), the motivation, challenges and issues were articulated in favor of virtual time ordering of Virtual Machines (VMs) in network simulations hosted on multi-core machines. Two major components in the overall virtualization challenge are (1) virtual timeline establishment and scheduling of VMs, and (2) virtualization of inter-VM communication. Here, we extend prior work by presenting scaling results for the first component, with experiment results on up to 128 VMs scheduled in virtual time order on a single 12-core host. We also explore the solution space of design alternatives formore » the second component, and present performance results from a multi-threaded, multi-queue implementation of inter-VM network control for synchronized execution with VM scheduling, incorporated in our NetWarp simulation system.« less
  • The paper describes a concept for implementing a combined data collecting, control and trigger system for Large Hadron Collider experiments at CERN. The system, called SWIPP, is based on a flexible interconnection of fast crossbar switches, interfaced to various transducer, storage and computer nodes by Protocol Engines. This infrastructure offers a common framework for various kinds of information, being programmable and scalable in an easy way. It endeavors to solve the extremely demanding instrumentation needs of these experiments in particle physics.