Effective Vectorization with OpenMP 4.5

Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham

doi:10.2172/1351758

Title: Effective Vectorization with OpenMP 4.5

Technical Report · Wed Mar 01 00:00:00 EST 2017

DOI:https://doi.org/10.2172/1351758· OSTI ID:1351758

Huber, Joseph N. ^[1]; Hernandez, Oscar R. ^[1]; Lopez, Matthew Graham ^[1]

Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMD is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.

View Technical Report

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1351758

Report Number(s):: ORNL/TM-2016/391

Country of Publication:: United States

Language:: English

Similar Records

An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

Journal Article · Mon Sep 19 00:00:00 EDT 2016 · Computer Physics Communications · OSTI ID:1351758

Vincenti, H.; Lobet, M.; Lehe, R.; +2 more

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1351758

Shen, Xipeng

Evaluation of Low Power ARM Processors for Monte Carlo Particle Transport: MCNP6 on the ARM Cortex-A8 - Paper 83

Conference · Mon Sep 15 00:00:00 EDT 2014 · OSTI ID:1351758

Sweezy, Jeremy

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Effective Vectorization with OpenMP 4.5

Citation Formats

Similar Records

Related Subjects