Advanced I/O for large-scale scientific applications.

Klasky, Scott; Schwan, Karsten; Oldfield, Ron A; Lofstead, II, Gerald F

doi:10.2172/1004371

Advanced I/O for large-scale scientific applications.

Technical Report · Fri Jan 01 04:00:00 EST 2010

DOI:https://doi.org/10.2172/1004371· OSTI ID:1004371

Klasky, Scott ^[1]; Schwan, Karsten ^[2]; Oldfield, Ron A; Lofstead, II, Gerald F ^[2]

Oak Ridge National Laboratory, Oak Ridge, TN
Georgia Institute of Technology, Atlanta, GA

As scientific simulations scale to use petascale machines and beyond, the data volumes generated pose a dual problem. First, with increasing machine sizes, the careful tuning of IO routines becomes more and more important to keep the time spent in IO acceptable. It is not uncommon, for instance, to have 20% of an application's runtime spent performing IO in a 'tuned' system. Careful management of the IO routines can move that to 5% or even less in some cases. Second, the data volumes are so large, on the order of 10s to 100s of TB, that trying to discover the scientifically valid contributions requires assistance at runtime to both organize and annotate the data. Waiting for offline processing is not feasible due both to the impact on the IO system and the time required. To reduce this load and improve the ability of scientists to use the large amounts of data being produced, new techniques for data management are required. First, there is a need for techniques for efficient movement of data from the compute space to storage. These techniques should understand the underlying system infrastructure and adapt to changing system conditions. Technologies include aggregation networks, data staging nodes for a closer parity to the IO subsystem, and autonomic IO routines that can detect system bottlenecks and choose different approaches, such as splitting the output into multiple targets, staggering output processes. Such methods must be end-to-end, meaning that even with properly managed asynchronous techniques, it is still essential to properly manage the later synchronous interaction with the storage system to maintain acceptable performance. Second, for the data being generated, annotations and other metadata must be incorporated to help the scientist understand output data for the simulation run as a whole, to select data and data features without concern for what files or other storage technologies were employed. All of these features should be attained while maintaining a simple deployment for the science code and eliminating the need for allocation of additional computational resources.

Research Organization:: Sandia National Laboratories

Sponsoring Organization:: USDOE

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 1004371

Report Number(s):: SAND2009-7763

Country of Publication:: United States

Language:: English

Similar Records

Active Storage with Analytics Capabilities and I/O Runtime System for Petascale Systems

Technical Report · Wed Mar 18 00:00:00 EDT 2015 · OSTI ID:1172904

Adaptable Metadata Rich IO Methods for Portable High Performance IO

Conference · Wed Dec 31 23:00:00 EST 2008 · OSTI ID:963933

Petascale Data Storage Institute (PDSI) Final Report

Technical Report · Sun Nov 25 23:00:00 EST 2012 · OSTI ID:1150023

Related Subjects

72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS
MANAGEMENT
PARITY
PERFORMANCE
PROCESSING
SIMULATION
STORAGE
TARGETS
TUNING

Advanced I/O for large-scale scientific applications.

Citation Formats

Similar Records

Related Subjects