Exploring Shared Memory Protocols in FLASH

Horowitz, Mark; Kunz, Robert; Hall, Mary; Lucas, Robert; Chame, Jacqueline

doi:10.2172/939091

Exploring Shared Memory Protocols in FLASH

Technical Report · Sun Apr 01 04:00:00 EDT 2007

DOI:https://doi.org/10.2172/939091· OSTI ID:939091

Horowitz, Mark ^[1]; Kunz, Robert; Hall, Mary; Lucas, Robert; Chame, Jacqueline

Stanford University

ABSTRACT The goal of this project was to improve the performance of large scientific and engineering applications through collaborative hardware and software mechanisms to manage the memory hierarchy of non-uniform memory access time (NUMA) shared-memory machines, as well as their component individual processors. In spite of the programming advantages of shared-memory platforms, obtaining good performance for large scientific and engineering applications on such machines can be challenging. Because communication between processors is managed implicitly by the hardware, rather than expressed by the programmer, application performance may suffer from unintended communication – communication that the programmer did not consider when developing his/her application. In this project, we developed and evaluated a collection of hardware, compiler, languages and performance monitoring tools to obtain high performance on scientific and engineering applications on NUMA platforms by managing communication through alternative coherence mechanisms. Alternative coherence mechanisms have often been discussed as a means for reducing unintended communication, although architecture implementations of such mechanisms are quite rare. This report describes an actual implementation of a set of coherence protocols that support coherent, non-coherent and write-update accesses for a CC-NUMA shared-memory architecture, the Stanford FLASH machine. Such an approach has the advantages of using alternative coherence only where it is beneficial, and also provides an evolutionary migration path for improving application performance. We present data on two computations, RandomAccess from the HPC Challenge benchmarks and a forward solver derived from LS-DYNA, showing the performance advantages of the alternative coherence mechanisms. For RandomAccess, the non-coherent and write-update versions can outperform the coherent version by factors of 5 and 2.5, respectively. In LS-DYNA, we obtain improvements of 18% on average using the non-coherent version. We also present data on the SpecOMP benchmarks, showing that the protocols have a modest overhead of less than 3% in applications where the alternative mechanisms are not needed. In addition to the selective coherence studies on the FLASH machine, in the last six months of this project ISI performed research on compiler technology for the transactional memory (TM) programming model being developed at Stanford. As part of this research ISI developed a compiler that recognizes transactional memory “pragmas” and automatically generates parallel code for the TM programming model

Research Organization:: Stanford University, Stanford, CA; University of Southern California, Los Angeles, CA

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: FG02-03ER25564

OSTI ID:: 939091

Report Number(s):: DOE DE-FG02-ER25564

Country of Publication:: United States

Language:: English

Similar Records

An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors

Conference · Thu Dec 30 23:00:00 EST 1993 · OSTI ID:46219

Interprocessor invocation on a NUMA multiprocessor. Technical report

Technical Report · Mon Oct 01 00:00:00 EDT 1990 · OSTI ID:5971577

Improving the performance of DSM systems via compiler involvement

Book · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:87677

Related Subjects

42 ENGINEERING
ARCHITECTURE
BENCHMARKS
COMMUNICATIONS
IMPLEMENTATION
MONITORING
PERFORMANCE
PROGRAMMING
cache coherence
shared memory multiprocessor
transactional memory

Exploring Shared Memory Protocols in FLASH

Citation Formats

Similar Records

Related Subjects