skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Effective Vectorization with OpenMP 4.5

Abstract

This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMD is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to bemore » executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less

Authors:
 [1];  [1];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE
OSTI Identifier:
1351758
Report Number(s):
ORNL/TM-2016/391
DOE Contract Number:
AC05-00OR22725
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Huber, Joseph N., Hernandez, Oscar R., and Lopez, Matthew Graham. Effective Vectorization with OpenMP 4.5. United States: N. p., 2017. Web. doi:10.2172/1351758.
Huber, Joseph N., Hernandez, Oscar R., & Lopez, Matthew Graham. Effective Vectorization with OpenMP 4.5. United States. doi:10.2172/1351758.
Huber, Joseph N., Hernandez, Oscar R., and Lopez, Matthew Graham. Wed . "Effective Vectorization with OpenMP 4.5". United States. doi:10.2172/1351758. https://www.osti.gov/servlets/purl/1351758.
@article{osti_1351758,
title = {Effective Vectorization with OpenMP 4.5},
author = {Huber, Joseph N. and Hernandez, Oscar R. and Lopez, Matthew Graham},
abstractNote = {This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMD is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.},
doi = {10.2172/1351758},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Mar 01 00:00:00 EST 2017},
month = {Wed Mar 01 00:00:00 EST 2017}
}

Technical Report:

Save / Share:
  • In September, 2016 IBM hosted an OpenMP 4.5 Hackathon at the TJ Watson Research Center. Teams from LLNL, ORNL, SNL, LANL, and LBNL attended the event. As with the 2015 hackathon, IBM produced an extremely useful and successful event with unmatched support from compiler team, applications staff, and facilities. Approximately 24 IBM staff supported 4-day hackathon and spent significant time 4-6 weeks out to prepare environment and become familiar with apps. This hackathon was also the first event to feature LLVM & XL C/C++ and Fortran compilers. This report records many of the issues encountered by the LLNL teams duringmore » the hackathon.« less
  • OpenMP is a widely adopted standard for threading directives across compiler implementations. The standard is very successful since it provides application writers with a simple, portable programming model for introducing shared memory parallelism into their codes. However, the standards do not address key issues for supporting that programming model in development tools such as debuggers. In this paper, we present DMPL, an OpenMP debugger interface that can be implemented as a dynamically loaded library. DMPL is currently being considered by the OpenMP Tools Committee as a mechanism to bridge the development tool gap in the OpenMP standard.
  • The applicability of the Monte Carlo method to the study of the interaction of negative pions at 4.5 Bev is investigated. The hypotheses at the basis of calculations on intranuclear cascades are reviewed. The principles of the Monte Carlo method and its adaptation to cascades of pi -nucleon collisions aredescribed. Elastic and inelastic processes are then studied. The results show that the Monte Carlo method is applicable without difficulty to collisions of the elastic type and to a cascade of such collisions. thus, the results relative to the coefficient of forbiddenness calculated for pions at 4.5 Bev are in excellentmore » agreement with those obtained by a different method. The occurrence of inelastic collisions present some difficulties. It is necessary to use a treatment respecting both the spirit of the simulation method and the conditions of compatibility between conditions imposed by the dynamics. It was shown that a previous attempt has serious objections and an alternate approach was proposed. Curves and charts that can be used in the case of collisions of 4.5-Bev pions are presented. Their generalization to other particles and energies presents no difficulty. (J.S.R.)« less
  • ITPACKV 2D is an adaptation for vector computers of the ITPACK 2C software package for solving large sparse linear systems of equations by adaptive accelerated iterative algorithms. This paper describes the techniques used to vectorize the iterative algorithms in the ITPACK 2C package for the Cyber 205 and Cray X-MP vector computers. The resulting package was named ITPACK 2D. Results of experiments using ITPACK 2C and ITPACKV 2D are given, including a comparison of megaflop rates and timings for two model problems. 21 refs., 7 tabs.
  • In petroleum engineering, the oil production profiles of a reservoir can be simulated by using a finite gridded model. This profile is affected by the number and choice of wells which in turn is a result of various production limits and constraints including, for example, the economic minimum well spacing, the number of drilling rigs available and the time required to drill and complete a well. After a well is available it may be shut in because of excessive water or gas productions. In order to optimize the field performance a penalty function algorithm was developed for scheduling wells. Formore » an example with some 343 wells and 15 different constraints, the scheduling routine vectorized for the CYBER 205 averaged 560 times faster performance than the scalar version.« less