Search for All Records

Experimental Characterization of OpenMP Offloading Memory Operations and Unified Shared Memory Support

Elwasif, Wael

The OpenMP specification recently introduced support for unified shared memory, allowing implementation to leverage underlying system software to provide a simpler GPU offloading model where explicit mapping of variables is optional. Support for this feature is becoming more available in different OpenMP implementations on several hardware platforms. A deeper understanding of the different implementation’s execution profile and performance is crucial for applications as they consider the performance portability implications of adopting a unified memory offloading programming style. This work introduces a benchmark tool to characterize unified memory support in several OepnMP compilers and runtimes, with emphasis on identifying discrepancies betweenmore »« less
https://doi.org/10.1007/978-3-031-40744-4_14

Full Text Available
Towards a Standard Process Management Infrastructure for Workflows Using Python

Elwasif, Wael; Naughton III, Thomas; Baker, Matthew

Orchestrating the execution of ensembles of processes lies at the core of scientific workflow engines on large scale parallel platforms. This is usually handled using platform-specific command line tools, with limited process management control and potential strain on system resources. The PMIx standard provides a uniform interface to system resources. The low level C implementation of PMIx has hampered its use in workflow engines, leading to the development of Python binding that has yet to gain traction. In this paper, we present our work to harden the PMIx Python client, demonstrating its usability using a prototype Python driver to orchestratemore »« less
https://doi.org/10.1007/978-3-031-29927-8_40

Full Text Available
RADICAL-Pilot and PMIx/PRRTE: Executing Heterogeneous Workloads at Large Scale on Partitioned HPC Resources

Titov, Mikhail; Matteo, Turilli; Merzky, Andre; ...

Execution of heterogeneous workflows on high-performance computing (HPC) platforms present unprecedented resource management and execution coordination challenges for runtime systems. Task heterogeneity increases the complexity of resource and execution management, limiting the scalability and efficiency of workflow execution. Resource partitioning and distribution of tasks execution over portioned resources promises to address those problems but we lack an experimental evaluation of its performance at scale. This paper provides a performance evaluation of the Process Management Interface for Exascale (PMIx) and its reference implementation PRRTE on the leadership-class HPC platform Summit, when integrated into a pilot-based runtime system called RADICAL-Pilot. We partitionmore »« less
https://doi.org/10.1007/978-3-031-22698-4_5

Full Text Available
Adaptive Generation of Training Data for ML Reduced Model Creation

Cianciosa, Mark; Archibald, Richard; Elwasif, Wael; ...

Machine learning proxy models are often used to speed up or completely replace complex computational models. The greatly reduced and deterministic computational costs enable new use cases such as digital twin control systems and global optimization. The challenge of building these proxy models is generating the training data. A naive uniform sampling of the input space can result in a non-uniform sampling of the output space of a model. This can cause gaps in the training data coverage that can miss finer scale details resulting in poor accuracy. While larger and larger data sets could eventually fill in these gaps,more »« less
https://doi.org/10.1109/BigData55660.2022.10020884

Full Text Available
$$\mathrm{RADICAL}$$-Pilot and $$\mathrm{PMIx}$$/$$\mathrm{PRRTE}$$: Executing Heterogeneous Workloads at Large Scale on Partitioned $$\mathrm{HPC}$$ Resources

Titov, Mikhail; Turilli, Matteo; Merzky, Andre; ... - Lecture Notes in Computer Science

Execution of heterogeneous workflows on high-performance computing (HPC) platforms present unprecedented resource management and execution coordination challenges for runtime systems. Task heterogeneity increases the complexity of resource and execution management, limiting the scalability and efficiency of workflow execution. Re-source partitioning and distribution of tasks execution over portioned re-sources promises to address those problems but we lack an experimental evaluation of its performance at scale. Here this paper provides a performance evaluation of the Process Management Interface for Exascale (PMIx) and its reference implementation PRRTE on the leadership-class HPC plat-form Summit, when integrated into a pilot-based runtime system called RADICAL-Pilot. Wemore »« less
https://doi.org/10.1007/978-3-031-22698-4_5

Full Text Available
HPC Molecular Simulation Tries Out a New GPU: Experiences on Early AMD Test Systems for the Frontier Supercomputer

Sedova, Ada; Davidson, Russ; Taillefumier, Mathieu; ...

Molecular simulation is an important tool for nu- merous efforts in physics, chemistry, and the biological sciences. Simulating molecular dynamics requires extremely rapid cal- culations to enable sufficient sampling of simulated temporal molecular processes. The Hewlett Packard Enterprise (HPE) Cray EX Frontier supercomputer installed at the Oak Ridge Leadership Computing Facility (OLCF) will provide an exascale resource for open science, and will feature graphics processing units (GPUs) from Advanced Micro Devices (AMD). The future LUMI supercomputer in Finland will be based on an HPE Cray EX platform as well. Here we test the ports of several widely used molecular dynamicsmore »« less
Full Text Available
Portability for GPU-accelerated molecular docking applications for cloud and HPC: can portable compiler directives provide performance across all platforms?

Thavappiragasam, Mathialakan; Elwasif, Wael; Sedova, Ada

High-throughput structure-based screening of drug-like molecules has become a common tool in biomedical research. Recently, acceleration with graphics processing units (GPUs) has provided a large performance boost for molecular docking programs. Both cloud and high-performance computing (HPC) resources have been used for large screens with molecular docking programs; while NVIDIA GPUs have dominated cloud and HPC resources, new vendors such as AMD and Intel are now entering the field, creating the problem of software portability across different GPUs. Ideally, software productivity could be maximized with portable programming models that are able to maintain high performance across architectures. While in manymore »« less
https://doi.org/10.1109/CCGrid54584.2022.00119

Full Text Available
Core-Pedestal Plasma Configurations in Advanced Tokamaks

Hassan, Ehab; Kessel, Charles; Park, Jin; ... - Fusion Science and Technology

Here, several configurations for the core and pedestal plasma are examined for a predefined tokamak design by implementing multiple heating/current drive (H/CD) sources to achieve an optimum configuration of high fusion power in a noninductive operation while maintaining an ideally magnetohydrodynamic (MHD) stable core plasma using the IPS-FASTRAN framework. IPS-FASTRAN is a component-based lightweight coupled simulation framework that is used to simulate magnetically confined plasma by integrating a set of high-fidelity codes to construct the plasma equilibrium (EFIT, TOQ, and CHEASE), calculate the turbulent heat and particle transport fluxes (TGLF), model various H/CD systems (TORIC, TORAY, GENRAY, and NUBEAM), modelmore »« less
https://doi.org/10.1080/15361055.2022.2145826

Full Text Available
Application Experiences on a GPU-Accelerated Arm-based HPC Testbed

Elwasif, Wael; Godoy, William; Hagerty, Nick; ...

This paper assesses and reports the experience of ten teams working to port, validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems, each one equipped with a server-class Arm CPU from Ampere Computing and two data center GPUs from NVIDIA Corp. The systems are connected together using InfiniBand interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust andmore »« less
https://doi.org/10.1145/3581576.3581621

Full Text Available
Workflows Community Summit 2022: A Roadmap Revolution

Ferreira Da Silva, Rafael; Badia, Rosa; Bala, Venkat; ...

Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from the execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing (often referred to as a computing continuum) and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, needmore »« less
https://doi.org/10.2172/2006942

Full Text Available

Prev ... Next

40 Search Results