skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DeStager: feature guided in-situ data management in distributed deep memory hierarchies

Abstract

In-situ analytics have been increasingly adopted by leadership scientific applications to gain fast insights into massive output data of simulations. With the current practice, systems buffer the output data in DRAM for analytics processing, constraining it to DRAM capacity un-used by the simulation. The rapid growth of data size requires alternative approaches to accommodating data-rich analytics, such as using solid-state disks to increase effective memory capacity. For this purpose, this paper explores software solutions for exploring the deep memory hierarchies expected on future high-end machines. Leveraging the fact that many analytics are sensitive to data features (regions-of-interest) hidden in the data being processed, the approach incorporates the knowledge of the data features into in-situ data management. It uses adaptive index creation/refinement to reduce the overhead of index management. In addition, it uses data features to predict data skew and improve load balance through controlling data distribution and placement on distributed staging servers. The experimental results show that such feature-guided optimizations achieve substantial improvements over state-of-the-art approaches for managing output data in-situ.

Authors:
ORCiD logo [1];  [2];  [1]
  1. Washington State Univ., Vancouver (United States). School of Engineering and Computer Science
  2. IBM, Yorktown Heights, NY (United States). Thomas J. Watson Research Center
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1565718
Resource Type:
Journal Article
Journal Name:
Distributed and Parallel Databases
Additional Journal Information:
Journal Volume: 37; Journal Issue: 1; Journal ID: ISSN 0926-8782
Country of Publication:
United States
Language:
English
Subject:
Computer Science

Citation Formats

Zhang, Xuechen, Zheng, Fang, and Nguyen, Bao. DeStager: feature guided in-situ data management in distributed deep memory hierarchies. United States: N. p., 2018. Web. doi:10.1007/s10619-018-7235-3.
Zhang, Xuechen, Zheng, Fang, & Nguyen, Bao. DeStager: feature guided in-situ data management in distributed deep memory hierarchies. United States. doi:10.1007/s10619-018-7235-3.
Zhang, Xuechen, Zheng, Fang, and Nguyen, Bao. Thu . "DeStager: feature guided in-situ data management in distributed deep memory hierarchies". United States. doi:10.1007/s10619-018-7235-3.
@article{osti_1565718,
title = {DeStager: feature guided in-situ data management in distributed deep memory hierarchies},
author = {Zhang, Xuechen and Zheng, Fang and Nguyen, Bao},
abstractNote = {In-situ analytics have been increasingly adopted by leadership scientific applications to gain fast insights into massive output data of simulations. With the current practice, systems buffer the output data in DRAM for analytics processing, constraining it to DRAM capacity un-used by the simulation. The rapid growth of data size requires alternative approaches to accommodating data-rich analytics, such as using solid-state disks to increase effective memory capacity. For this purpose, this paper explores software solutions for exploring the deep memory hierarchies expected on future high-end machines. Leveraging the fact that many analytics are sensitive to data features (regions-of-interest) hidden in the data being processed, the approach incorporates the knowledge of the data features into in-situ data management. It uses adaptive index creation/refinement to reduce the overhead of index management. In addition, it uses data features to predict data skew and improve load balance through controlling data distribution and placement on distributed staging servers. The experimental results show that such feature-guided optimizations achieve substantial improvements over state-of-the-art approaches for managing output data in-situ.},
doi = {10.1007/s10619-018-7235-3},
journal = {Distributed and Parallel Databases},
issn = {0926-8782},
number = 1,
volume = 37,
place = {United States},
year = {2018},
month = {8}
}

Works referenced in this record:

DataStager: scalable data staging services for petascale applications
conference, January 2009

  • Abbasi, Hasan; Wolf, Matthew; Eisenhauer, Greg
  • Proceedings of the 18th ACM international symposium on High performance distributed computing - HPDC '09
  • DOI: 10.1145/1551609.1551618

Parallel construction of multidimensional binary search trees
conference, January 1996

  • Al-Furaih, Ibraheem; Aluru, Srinivas; Goil, Sanjay
  • Proceedings of the 10th international conference on Supercomputing - ICS '96
  • DOI: 10.1145/237578.237605

Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications
conference, January 2009

  • Caulfield, Adrian M.; Grupp, Laura M.; Swanson, Steven
  • Proceeding of the 14th international conference on Architectural support for programming languages and operating systems - ASPLOS '09
  • DOI: 10.1145/1508244.1508270

Hystor: making the best use of solid state drives in high performance storage systems
conference, January 2011

  • Chen, Feng; Koufaty, David A.; Zhang, Xiaodong
  • Proceedings of the international conference on Supercomputing - ICS '11
  • DOI: 10.1145/1995896.1995902

Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics
conference, May 2014

  • Dayal, Jai; Bratcher, Drew; Eisenhauer, Greg
  • 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
  • DOI: 10.1109/CCGrid.2014.104

Event-based systems: opportunities and challenges at exascale
conference, January 2009

  • Eisenhauer, Greg; Wolf, Matthew; Abbasi, Hasan
  • Proceedings of the Third ACM International Conference on Distributed Event-Based Systems - DEBS '09
  • DOI: 10.1145/1619258.1619261

R-trees: a dynamic index structure for spatial searching
conference, January 1984

  • Guttman, Antonin
  • Proceedings of the 1984 ACM SIGMOD international conference on Management of data - SIGMOD '84
  • DOI: 10.1145/602259.602266

DASH-IO: an empirical study of flash-based IO for HPC
conference, January 2010

  • He, Jiahua; Bennett, Jeffrey; Snavely, Allan
  • Proceedings of the 2010 TeraGrid Conference on - TG '10, Article No. 10
  • DOI: 10.1145/1838574.1838584

DASH: a Recipe for a Flash-based Data Intensive Supercomputer
conference, November 2010

  • He, Jiahua; Jagatheesan, Arun; Gupta, Sandeep
  • 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.16

Full f gyrokinetic method for particle simulation of tokamak transport
journal, May 2008

  • Heikkinen, J. A.; Janhunen, S. J.; Kiviniemi, T. P.
  • Journal of Computational Physics, Vol. 227, Issue 11
  • DOI: 10.1016/j.jcp.2008.02.013

Using cross-layer adaptations for dynamic data management in large scale coupled scientific workflows
conference, January 2013

  • Jin, Tong; Zhang, Fan; Sun, Qian
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • DOI: 10.1145/2503210.2503301

Exploring Data Staging Across Deep Memory Hierarchies for Coupled Data Intensive Simulation Workflows
conference, May 2015

  • Jin, Tong; Zhang, Fan; Sun, Qian
  • 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2015.50

Exploring the future of out-of-core computing with compute-local non-volatile memory
conference, January 2013

  • Jung, Myoungsoo; Wilson, Ellis H.; Choi, Wonil
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • DOI: 10.1145/2503210.2503261

Parallel in situ indexing for data-intensive computing
conference, October 2011

  • Kim, Jinoh; Abbasi, Hasan; Chacon, Luis
  • 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV)
  • DOI: 10.1109/LDAV.2011.6092319

Grid -Based Parallel Data Streaming implemented for the Gyrokinetic Toroidal Code
conference, January 2003

  • Klasky, S.; Ethier, S.; Lin, Z.
  • Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03
  • DOI: 10.1145/1048935.1050175

Scalable in situ scientific data encoding for analytical query processing
conference, January 2013

  • Lakshminarasimhan, Sriram; Boyuka, David A.; Pendse, Saurabh V.
  • Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13
  • DOI: 10.1145/2493123.2465527

A massively parallel adaptive fast-multipole method on heterogeneous architectures
conference, January 2009

  • Lashuk, Ilya; Biros, George; Chandramowlishwaran, Aparna
  • Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
  • DOI: 10.1145/1654059.1654118

A Distributed Kernel Summation Framework for General-Dimension Machine Learning
conference, December 2013

  • Lee, Dongryeol; Vuduc, Richard; Gray, Alexander G.
  • Proceedings of the 2012 SIAM International Conference on Data Mining
  • DOI: 10.1137/1.9781611972825.34

Bulk insertion for R-trees by seeded clustering
journal, October 2006


On the role of burst buffers in leadership-class storage systems
conference, April 2012

  • Liu, Ning; Cope, Jason; Carns, Philip
  • 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2012.6232369

Marching cubes: A high resolution 3D surface construction algorithm
conference, January 1987

  • Lorensen, William E.; Cline, Harvey E.
  • Proceedings of the 14th annual conference on Computer graphics and interactive techniques - SIGGRAPH '87
  • DOI: 10.1145/37401.37422

Analysis of the clustering properties of the Hilbert space-filling curve
journal, January 2001

  • Moon, B.; Jagadish, H. V.; Faloutsos, C.
  • IEEE Transactions on Knowledge and Data Engineering, Vol. 13, Issue 1
  • DOI: 10.1109/69.908985

Spatial indexing of distributed multidimensional datasets
conference, January 2005

  • Nam, B.; Sussman, A.
  • CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.
  • DOI: 10.1109/CCGRID.2005.1558637

Large-scale adaptive mesh simulations through non-volatile byte-addressable memory
conference, January 2017

  • Nguyen, Bao; Tan, Hua; Zhang, Xuechen
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
  • DOI: 10.1145/3126908.3126944

Fast Parallel Algorithms for Short-Range Molecular Dynamics
journal, March 1995


Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
conference, June 2011

  • Prabhakar, Ramya; Vazhkudai, Sudharshan S.; Kim, Youngjae
  • 2011 31st International Conference on Distributed Computing Systems (ICDCS)
  • DOI: 10.1109/ICDCS.2011.33

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps
conference, January 2015

  • Su, Yu; Wang, Yi; Agrawal, Gagan
  • Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15
  • DOI: 10.1145/2749246.2749268

Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing
journal, March 2015

  • Vetter, Jeffrey S.; Mittal, Sparsh
  • Computing in Science & Engineering, Vol. 17, Issue 2
  • DOI: 10.1109/MCSE.2015.4

NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines
conference, May 2012

  • Wang, Chao; Vazhkudai, Sudharshan S.; Ma, Xiaosong
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2012.90

SmartPointers: Personalized Scientific Data Portals In Your Hand
conference, January 2002


I-CASH: Intelligently Coupled Array of SSD and HDD
conference, February 2011

  • Yang, Qing; Ren, Jin
  • 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)
  • DOI: 10.1109/HPCA.2011.5749736

In Situ Visualization for Large-Scale Combustion Simulations
journal, May 2010

  • Hongfeng Yu, ; Grout, Ray W.
  • IEEE Computer Graphics and Applications, Vol. 30, Issue 3
  • DOI: 10.1109/MCG.2010.55

Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications
conference, December 2016

  • Zhang, Wenzhao; Tang, Houjun; Ranshous, Stephen
  • 2016 IEEE International Conference on Big Data (Big Data)
  • DOI: 10.1109/BigData.2016.7840743

FlashStager: Improving the Performance of SSD-Based Data Staging Systems via Write Redirection
conference, September 2016

  • Zhang, Xuechen; Zheng, Fang; Schwan, Karsten
  • 2016 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2016.46