skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Subgraph Isomorphism on a Multithreaded Shared Memory Architecture.


Abstract not provided.

; ;
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: Proposed for presentation at the Workshop on Multithreaded Architectures and Applications held May 20, 2011 in Anchorage, AK.
Country of Publication:
United States

Citation Formats

Leung, Vitus Joseph, McLendon, William Clarence,, and Ralph, Claire. Subgraph Isomorphism on a Multithreaded Shared Memory Architecture.. United States: N. p., 2011. Web.
Leung, Vitus Joseph, McLendon, William Clarence,, & Ralph, Claire. Subgraph Isomorphism on a Multithreaded Shared Memory Architecture.. United States.
Leung, Vitus Joseph, McLendon, William Clarence,, and Ralph, Claire. Sat . "Subgraph Isomorphism on a Multithreaded Shared Memory Architecture.". United States. doi:.
title = {Subgraph Isomorphism on a Multithreaded Shared Memory Architecture.},
author = {Leung, Vitus Joseph and McLendon, William Clarence, and Ralph, Claire},
abstractNote = {Abstract not provided.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sat Jan 01 00:00:00 EST 2011},
month = {Sat Jan 01 00:00:00 EST 2011}

Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
  • We present a parallel implementation of the popular k-means clustering algorithm for massively multithreaded computer systems, as well as a parallelized version of the KKZ seed selection algorithm. We demonstrate that as system size increases, sequential seed selection can become a bottleneck. We also present an early attempt at parallelizing k-means that highlights critical performance issues when programming massively multithreaded systems. For our case studies, we used data collected from electric power simulations and run on the Cray XMT.
  • In this thesis the author studies the interconnection networks and their switch architectures, the performance of different architectures, under the MIMD shared memory environment, using both simulation and analytical methods. The networks he studies are constructed with a basic building block, a switch element. Quite a few alternatives are proposed in designing the switch element. Naturally, different switch architectures given different performance. He studies the performance of different switches with both analytical methods and extensive simulations. He also proposes a multiple hand shaking signal switch architecture which gives the maximum performance. Various interconnection networks and interested issues related to themmore » are reviewed in the thesis. He proposes a new class of interconnection networks called F networks. In comparison to traditional multi-stage network, F networks provide faster communications among nodes within a cluster. Also extra routes available in the F network fault-tolerant. Based on simulations and analysis, Kruskal, Snir and Weiss established a formula for calculating network delays under moderate traffic. From his simulations the author surprisedly discovered that the Kruskal-Snir-Weiss formula holds only of the forward path delay of the networks, the return path delay is actually substantially less than the delay of the forward path. He completes the network performance formula by extending the Kruskal-Snir-Weiss formula to include the return path. He also analyzes the network performance under hot spot' traffic and obtains analytic results on the performance attained.« less
  • Portability, efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes a new approach, called Global Arrays (GA), that combines the better features of both other models, leading to both simple coding andmore » efficient execution. The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. The authors have implemented GA libraries on a variety of computer systems, including the Intel DELTA and Paragon, the IBM SP-1 (all message-passers), the Kendall Square KSR-2 (a nonuniform access shared-memory machine), and networks of Unix workstations. They discuss the design and implementation of these libraries, report their performance, illustrate the use of GA in the context of computational chemistry applications, and describe the use of a GA performance visualization tool.« less