skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A programming model performance study using the NAS parallel benchmarks

Abstract

Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the three programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors.We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.

Authors:
 [1];  [1];  [1];  [1];  [2];  [3];  [4];  [4]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
  2. NASA Ames Research Center (ARC), Moffett Field, Mountain View, CA (United States)
  3. Univ. of California, Berkeley, CA (United States). Electrical Engineering & Computer Sciences Dept.
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF), Oak Ridge, TN (United States): Univ. of California, Oakland, CA (United States); UT-Battelle LLC/ORNL, Oak Ridge, TN (Unted States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1564727
Grant/Contract Number:  
AC02-05CH11231; AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Scientific Programming
Additional Journal Information:
Journal Volume: 18; Journal Issue: 3-4
Publisher:
Hindawi
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Computer Science; Programming model; performance study; UPC; OpenMP; MPI; memory usage

Citation Formats

Shan, Hongzhang, Blagojevic, Filip, Min, Seung-Jai, Hargrove, Paul, Jin, Haoqiang, Fuerlinger, Karl, Koniges, Alice, and Wright, Nicholas J. A programming model performance study using the NAS parallel benchmarks. United States: N. p., 2010. Web. doi:10.3233/spr-2010-0306.
Shan, Hongzhang, Blagojevic, Filip, Min, Seung-Jai, Hargrove, Paul, Jin, Haoqiang, Fuerlinger, Karl, Koniges, Alice, & Wright, Nicholas J. A programming model performance study using the NAS parallel benchmarks. United States. doi:10.3233/spr-2010-0306.
Shan, Hongzhang, Blagojevic, Filip, Min, Seung-Jai, Hargrove, Paul, Jin, Haoqiang, Fuerlinger, Karl, Koniges, Alice, and Wright, Nicholas J. Fri . "A programming model performance study using the NAS parallel benchmarks". United States. doi:10.3233/spr-2010-0306. https://www.osti.gov/servlets/purl/1564727.
@article{osti_1564727,
title = {A programming model performance study using the NAS parallel benchmarks},
author = {Shan, Hongzhang and Blagojevic, Filip and Min, Seung-Jai and Hargrove, Paul and Jin, Haoqiang and Fuerlinger, Karl and Koniges, Alice and Wright, Nicholas J.},
abstractNote = {Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the three programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors.We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.},
doi = {10.3233/spr-2010-0306},
journal = {Scientific Programming},
number = 3-4,
volume = 18,
place = {United States},
year = {2010},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: