skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage

Conference ·

© 2018 IEEE. Scientific data analysis typically involves reading massive amounts of data that was generated by simulations, experiments, and observations. Performance of reading such large volumes of data from disk-based file systems is often poor because of the slow and mechanical components in the disks. Recent supercomputing systems are adding non-volatile storage layers in a hierarchy to handle the performance gap between fast main memory and slow disk-based storage. Software libraries for managing this hierarchy not only need efficient reading of data but also reduce user-involvement for cross-layer data movement. Furthermore, these libraries need to support array data access patterns into hierarchical storage management as scientific data is often organized in array-based data structures. Existing software typically manage individual storage layers requiring significant manual process in moving data among them. In this paper, we introduce a new array caching in hierarchical storage (ARCHIE) to accelerate array data analysis in a seamless fashion. ARCHIE evaluates array access patterns and prefetches data with array semantics between storage layers. Our evaluation shows that ARCHIE outperforms state-of-the-art file systems, i.e., Lustre and DataWarp, on a production supercomputing system by up to 5.8× in accessing data by scientific analysis applications.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1602833
Resource Relation:
Conference: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, (United States), December 10-13, 2018
Country of Publication:
United States
Language:
English

Similar Records

ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
Journal Article · Fri Jan 17 00:00:00 EST 2020 · Journal of Computer Science and Technology · OSTI ID:1602833

SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator
Conference · Tue Jan 01 00:00:00 EST 2013 · OSTI ID:1602833

SCORPIO: A scalable two-phase parallel I/O library with application to a large scale subsurface simulator
Conference · Sun Dec 01 00:00:00 EST 2013 · 20th Annual International Conference on High Performance Computing; 17 April 2014; Bangalore, India · OSTI ID:1602833

Related Subjects