skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: CLOMP v1.5

Abstract

CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading. For simplicity, it does not use MPI by default but it is expected to be run on the resources a threaded MPI task would use (e.g., a portion of a shared memory compute node). Compiling with -DWITH_MPI allows packing one or more nodes with CLOMP tasks and having CLOMP report OpenMP performance for the slowest MPI task. On current systems, the strong scaling performance results for 4, 8, or 16 threads are of the most interest. Suggested weak scaling inputs are provided for evaluating future systems. Since MPI is often used to place at least one MPI task per coherence or NUMA domain, it is recommended to focus OpenMP runtime measurements on a subset of node hardware where it is most possible to have low OpenMP overheads (e.g., within one coherence domain or NUMA domain).

Authors:
 [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1420296
Report Number(s):
LLNL-TR-745697
DOE Contract Number:  
AC52-07NA27344
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Gyllenhaal, J. CLOMP v1.5. United States: N. p., 2018. Web. doi:10.2172/1420296.
Gyllenhaal, J. CLOMP v1.5. United States. doi:10.2172/1420296.
Gyllenhaal, J. Thu . "CLOMP v1.5". United States. doi:10.2172/1420296. https://www.osti.gov/servlets/purl/1420296.
@article{osti_1420296,
title = {CLOMP v1.5},
author = {Gyllenhaal, J.},
abstractNote = {CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading. For simplicity, it does not use MPI by default but it is expected to be run on the resources a threaded MPI task would use (e.g., a portion of a shared memory compute node). Compiling with -DWITH_MPI allows packing one or more nodes with CLOMP tasks and having CLOMP report OpenMP performance for the slowest MPI task. On current systems, the strong scaling performance results for 4, 8, or 16 threads are of the most interest. Suggested weak scaling inputs are provided for evaluating future systems. Since MPI is often used to place at least one MPI task per coherence or NUMA domain, it is recommended to focus OpenMP runtime measurements on a subset of node hardware where it is most possible to have low OpenMP overheads (e.g., within one coherence domain or NUMA domain).},
doi = {10.2172/1420296},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Feb 01 00:00:00 EST 2018},
month = {Thu Feb 01 00:00:00 EST 2018}
}

Technical Report:

Save / Share: