Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Current parallel I/O limitations to scalable data analysis.

Technical Report ·
DOI:https://doi.org/10.2172/1022200· OSTI ID:1022200
This report describes the limitations to parallel scalability which we have encountered when applying our otherwise optimally scalable parallel statistical analysis tool kit to large data sets distributed across the parallel file system of the current premier DOE computational facility. This report describes our study to evaluate the effect of parallel I/O on the overall scalability of a parallel data analysis pipeline using our scalable parallel statistics tool kit [PTBM11]. In this goal, we tested it using the Jaguar-pf DOE/ORNL peta-scale platform on a large combustion simulation data under a variety of process counts and domain decompositions scenarios. In this report we have recalled the foundations of the parallel statistical analysis tool kit which we have designed and implemented, with the specific double intent of reproducing typical data analysis workflows, and achieving optimal design for scalable parallel implementations. We have briefly reviewed those earlier results and publications which allow us to conclude that we have achieved both goals. However, in this report we have further established that, when used in conjuction with a state-of-the-art parallel I/O system, as can be found on the premier DOE peta-scale platform, the scaling properties of the overall analysis pipeline comprising parallel data access routines degrade rapidly. This finding is problematic and must be addressed if peta-scale data analysis is to be made scalable, or even possible. In order to attempt to address these parallel I/O limitations, we will investigate the use the Adaptable IO System (ADIOS) [LZL+10] to improve I/O performance, while maintaining flexibility for a variety of IO options, such MPI IO, POSIX IO. This system is developed at ORNL and other collaborating institutions, and is being tested extensively on Jaguar-pf. Simulation code being developed on these systems will also use ADIOS to output the data thereby making it easier for other systems, such as ours, to process that data.
Research Organization:
Sandia National Laboratories
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1022200
Report Number(s):
SAND2011-4648
Country of Publication:
United States
Language:
English

Similar Records

An abstract-device interface for implementing portable parallel-I/O interfaces
Conference · Mon Dec 30 23:00:00 EST 1996 · OSTI ID:418491

Combining In-situ and In-transit Processing to Enable Extreme-Scale Scientific Analysis
Conference · Thu Nov 01 00:00:00 EDT 2012 · OSTI ID:1096981

Parallel visualization and analysis with paraview on a Cray XT4
Journal Article · Thu Dec 31 23:00:00 EST 2009 · OSTI ID:1019552