skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing noncontiguous accesses in MPI-IO.

Abstract

The I/O access patterns of many parallel applications consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access noncontiguous data with a single I/O function call, unlike in Unix I/O. In this paper, we explain how critical this feature of MPI-IO is for high performance and how it enables implementations to perform optimizations. We first provide a classification of the different ways of expressing an application's I/O needs in MPI-IO -- we classify them into four levels, called levels 0--3. We demonstrate that, for applications with noncontiguous access patterns, the I/O performance improves dramatically if users write their applications to make level-3 requests (noncontiguous, collective) rather than level-0 requests (Unix style). We then describe how our MPI-IO implementation, ROMIO, delivers high performance for noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how we have implemented these optimizations portably on multiple machines and file systems, controlledmore » their memory requirements, and also achieved high performance. We demonstrate the performance and portability with performance results for three applications -- an astrophysics-application template (DIST3D), the NAS BTIO benchmark, and an unstructured code (UNSTRUC) -- on five different parallel machines: HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.« less

Authors:
; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC); USDOD; National Science Foundation (NSF)
OSTI Identifier:
943133
Report Number(s):
ANL/MCS/JA-37760
Journal ID: ISSN 0167-8191; PACOEJ; TRN: US201002%%619
DOE Contract Number:  
DE-AC02-06CH11357
Resource Type:
Journal Article
Journal Name:
Parallel Comput.
Additional Journal Information:
Journal Volume: 28; Journal Issue: 1 ; Jan. 2002; Journal ID: ISSN 0167-8191
Country of Publication:
United States
Language:
ENGLISH
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; COMPUTER CODES; DATA; MEMORY MANAGEMENT; OPTIMIZATION; PARALLEL PROCESSING; PERFORMANCE

Citation Formats

Thakur, R, Gropp, W, Lusk, E, and Mathematics and Computer Science. Optimizing noncontiguous accesses in MPI-IO.. United States: N. p., 2002. Web. doi:10.1016/S0167-8191(01)00129-6.
Thakur, R, Gropp, W, Lusk, E, & Mathematics and Computer Science. Optimizing noncontiguous accesses in MPI-IO.. United States. doi:10.1016/S0167-8191(01)00129-6.
Thakur, R, Gropp, W, Lusk, E, and Mathematics and Computer Science. Tue . "Optimizing noncontiguous accesses in MPI-IO.". United States. doi:10.1016/S0167-8191(01)00129-6.
@article{osti_943133,
title = {Optimizing noncontiguous accesses in MPI-IO.},
author = {Thakur, R and Gropp, W and Lusk, E and Mathematics and Computer Science},
abstractNote = {The I/O access patterns of many parallel applications consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access noncontiguous data with a single I/O function call, unlike in Unix I/O. In this paper, we explain how critical this feature of MPI-IO is for high performance and how it enables implementations to perform optimizations. We first provide a classification of the different ways of expressing an application's I/O needs in MPI-IO -- we classify them into four levels, called levels 0--3. We demonstrate that, for applications with noncontiguous access patterns, the I/O performance improves dramatically if users write their applications to make level-3 requests (noncontiguous, collective) rather than level-0 requests (Unix style). We then describe how our MPI-IO implementation, ROMIO, delivers high performance for noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how we have implemented these optimizations portably on multiple machines and file systems, controlled their memory requirements, and also achieved high performance. We demonstrate the performance and portability with performance results for three applications -- an astrophysics-application template (DIST3D), the NAS BTIO benchmark, and an unstructured code (UNSTRUC) -- on five different parallel machines: HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.},
doi = {10.1016/S0167-8191(01)00129-6},
journal = {Parallel Comput.},
issn = {0167-8191},
number = 1 ; Jan. 2002,
volume = 28,
place = {United States},
year = {2002},
month = {1}
}