skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evaluating the potential of multithreaded platforms for irregular scientific computations

Abstract

We have conducted a detailed study to understand the potential of multithreaded architectures to increase the performance of data-intensive, irregular scientific applications. In addition to microbenchmarks, our study included a power system state estimation application and an anomaly detection application applied to network traffic data. The evaluation was performed on the Cray MTA-2 and the Sun Niagara.

Authors:
; ; ; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
990149
Report Number(s):
PNNL-SA-53303
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of 4th ACM International Conference on Computing Frontiers, May 7-9, 2007, Ischia, Italy, 47 - 58
Country of Publication:
United States
Language:
English
Subject:
multithreading, data intensive applications

Citation Formats

Nieplocha, Jarek, Marquez, Andres, Feo, John, Chavarría-Miranda, Daniel, Chin, George, Scherrer, Chad, and Beagley, Nathaniel. Evaluating the potential of multithreaded platforms for irregular scientific computations. United States: N. p., 2007. Web. doi:10.1145/1242531.1242541.
Nieplocha, Jarek, Marquez, Andres, Feo, John, Chavarría-Miranda, Daniel, Chin, George, Scherrer, Chad, & Beagley, Nathaniel. Evaluating the potential of multithreaded platforms for irregular scientific computations. United States. doi:10.1145/1242531.1242541.
Nieplocha, Jarek, Marquez, Andres, Feo, John, Chavarría-Miranda, Daniel, Chin, George, Scherrer, Chad, and Beagley, Nathaniel. Mon . "Evaluating the potential of multithreaded platforms for irregular scientific computations". United States. doi:10.1145/1242531.1242541.
@article{osti_990149,
title = {Evaluating the potential of multithreaded platforms for irregular scientific computations},
author = {Nieplocha, Jarek and Marquez, Andres and Feo, John and Chavarría-Miranda, Daniel and Chin, George and Scherrer, Chad and Beagley, Nathaniel},
abstractNote = {We have conducted a detailed study to understand the potential of multithreaded architectures to increase the performance of data-intensive, irregular scientific applications. In addition to microbenchmarks, our study included a power system state estimation application and an anomaly detection application applied to network traffic data. The evaluation was performed on the Cray MTA-2 and the Sun Niagara.},
doi = {10.1145/1242531.1242541},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon May 07 00:00:00 EDT 2007},
month = {Mon May 07 00:00:00 EDT 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • We have conducted a detailed study to understand the potential of multithreaded architectures to increase the performance of data-intensive, irregular scientific applications. Our study is based on a power system state estimation application and an anomaly detection application applied to network traffic data. We also conducted a detailed evaluation of the platforms using microbenchmarks in order to gain insight into their architectural capabilities and their interaction with programming models and application software. The evaluation was performed on the Cray MTA-2 and the Sun Niagara. Our results show that irregular applications perform well on these two diverse multithreaded platforms in casesmore » where the bandwidth provided by the underlying platform can be exploited successfully.« less
  • Languages are being designed that simplify the tasks of creating, extending, and maintaining scientific application specifically for use on parallel computing architectures. Widespread adoption of any language by the high performance computing (HPC) community is strongly dependent upon achieved performance of applications. A common presumption is that performance is adversely affected as the level of abstraction increases. In this paper we report on our investigations into the potential of one such language, Chapel, to deliver performance while adhering to its code development and maintenance goals. In particular, we explore how the unconstrained memory model presented by Chapel may be exploitedmore » by the compiler and runtime system in order to efficiently execute computations common to numerous scientific application programs. Experiments, executed on a Cray X1E, AMD dual-core, and Intel quad- core processor based systems, reveal that with the appropriate architecture and runtime support, the Chapel model can achieve performance equal to the best Fortran/MPI, Co-Array Fortran, and OpenMP implementations, while substantially easing the burden on the application code developer.« less
  • Floating-point addition and multiplication are not necessarily associative. When performing those operations over large numbers of operands with different magnitudes, the order in which individual operations are performed can affect the final result. On massively multithreaded systems, when performing parallel reductions, the non-deterministic nature of numerical operation interleaving can lead to non-deterministic numerical results. We have investigated the effect of this problem on the convergence of a conjugate gradient calculation used as part of a power grid analysis application.
  • Commonly represented as directed graphs, social networks depict relationships and behaviors among social entities such as people, groups, and organizations. Social network analysis denotes a class of mathematical and statistical methods designed to study and measure social networks. Beyond sociolo-gy, social network analysis methods are being applied to other types of data in other domains such as bioinformatics, computer networks, national security, and economics. For particular problems, the size of a social network can grow to millions of nodes and tens of millions of edges or more. In such cases, researchers could benefit from the application of social network analysismore » algorithms on high-performance architectures and systems. The Cray XMT is a third generation multithreaded system based on the Cray XT-3/4 platform. Like most other multithreaded architectures, the Cray XMT is designed to tolerate memory access latencies by switching context between threads. The processors maintain multiple threads of execution and util-ize hardware-based context switching to overlap the memory latency incurred by any thread with the computations from other threads. Due to its memory latency tolerance, the Cray XMT has the poten-tial of significantly improving the execution speed of irregular data-intensive applications such as those found in social network analysis. In this paper, we describe our experiences in developing and optimizing three implementations of a social network analysis method known as triadic analysis to execute on the Cray XMT. The three im-plementations possess different execution complexities, qualities, and characteristics. We evaluate how the various attributes of the codes affect their performance on the Cray XMT. We also explore the effects of different compiler options and execution strategies on the different triadic analysis im-plementations and identify general XMT programming issues and lessons learned.« less
  • We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less