skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Collective I/O Tuning Using Analytical and Machine-Learning Models

Conference ·

The ever larger demand of scientific applications for computation and data is currently driving a continuous increase in scale of parallel computers. The inherent complexity of scaling up a computing systems in terms of both hardware and software stack exposes an increasing number of factors impacting the performance and complicating the process of optimization. In particular, the optimization of parallel I/O has become increasingly challenging due to increasing storage hierarchy and well known performance variability of shared storage systems. This paper focuses on model-based autotuning of the two-phase collective I/O algorithm from a popular MPI distribution on the Blue Gene/Q architecture. We propose a novel hybrid model, constructed as a composition of analytical models for communication and storage operations and black-box models for the performance of the individual operations. We perform an in-depth study of the complexity involved in performance modeling including architecture, software stack and noise. In particular we address this challenges of modeling the performance of shared storage systems by building a benchmark that helps synthesizing factors such as topology, file caching, and noise. The experimental results show that the hybrid approach produces significantly better results than state-of-the-art machine learning approaches and shows a higher robustness to noise, at the cost of a higher modeling complexity

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science - Office of Advanced Scientific Computing Research
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1351298
Resource Relation:
Conference: 2015 IEEE Cluster , 09/08/15 - 09/11/15, Chicago, IL, US
Country of Publication:
United States
Language:
English