skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Reducing Application Runtime Variability on Jaguar XT5

Conference ·
OSTI ID:981788

Operating system (OS) noise is defined as interference generated by the OS that prevents a compute core from performing ``useful'' work. Compute node kernel daemons, network interfaces, and other OS related services are major sources of such interference. This interference on individual compute cores can vary in duration and frequency, and can cause de-synchronization (jitter) in collective communication tasks and thus results in variable (degraded) overall parallel application performance. This behavior is more observable in large-scale applications using certain types of collective communication primitives, such as MPI\_Allreduce. This paper presents our effort towards reducing the overall effect of OS noise on our large-scale parallel applications. Our tests were performed on the quad-core Jaguar, the Cray XT5 at the Oak Ridge National Laboratory Leadership Computing Facility (OLCF). At the time of these tests, Jaguar was a 1.4 PFLOPS supercomputer with 149,504 compute cores and 8 cores per node. We aggregated OS noise sources onto a single core for each node. The scientific application was then run on six of the remaining cores in each node. Our results show that we were able to improve the MPI_Allreduce performance by two orders of magnitude. We demonstrated up to a 30% boost in the performance of the Parallel Ocean Program (POP) using this technique.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). National Center for Computational Sciences (NCCS)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
981788
Resource Relation:
Conference: Cray Users Group Conference (CUG) 2010, Edinburgh, United Kingdom, 20100524, 20100524
Country of Publication:
United States
Language:
English