DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Harnessing Data Movement in Virtual Clusters for In-Situ Execution

Abstract

As a result of increasing data volume and velocity, Big Data science at exascale has shifted towards the in-situ paradigm, where large scale simulations run concurrently alongside data analytics. With in-situ, data generated from simulations can be processed while still in memory, thereby avoiding the slow storage bottleneck. However, running simulations and analytics together on shared resources will likely result in substantial contention if left unmanaged, as demonstrated in this work, leading to much reduced efficiency of simulations and analytics. Recently, virtualization technologies such as Linux containers have been widely applied to data centers and physical clusters to provide highly efficient and elastic resource provisioning for consolidated workloads including scientific simulations and data analytics. In this paper, we investigate to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications in virtual clusters. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Although this can be an effective technique, the naïve usage of network virtualization can lead to performance degradation for bursty asynchronous transmissions within an MPI job. Here, we analyze and resolve this performance degradation in virtual clusters.

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [3];  [1]; ORCiD logo [3]; ORCiD logo [3]; ORCiD logo [3]
  1. Univ. of Central Florida, Orlando, FL (United States)
  2. New Jersey Inst. of Technology, Newark, NJ (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1559602
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Parallel and Distributed Systems
Additional Journal Information:
Journal Volume: 30; Journal Issue: 3; Journal ID: ISSN 1045-9219
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; In-situ applications; virtual network; virtual switch; ARIMA; collective communication; MPI

Citation Formats

Huang, Dan, Liu, Qing, Klasky, Scott A., Wang, Jun, Choi, Jong Youl, Logan, Jeremy, and Podhorszki, Norbert. Harnessing Data Movement in Virtual Clusters for In-Situ Execution. United States: N. p., 2018. Web. doi:10.1109/TPDS.2018.2867879.
Huang, Dan, Liu, Qing, Klasky, Scott A., Wang, Jun, Choi, Jong Youl, Logan, Jeremy, & Podhorszki, Norbert. Harnessing Data Movement in Virtual Clusters for In-Situ Execution. United States. https://doi.org/10.1109/TPDS.2018.2867879
Huang, Dan, Liu, Qing, Klasky, Scott A., Wang, Jun, Choi, Jong Youl, Logan, Jeremy, and Podhorszki, Norbert. Thu . "Harnessing Data Movement in Virtual Clusters for In-Situ Execution". United States. https://doi.org/10.1109/TPDS.2018.2867879. https://www.osti.gov/servlets/purl/1559602.
@article{osti_1559602,
title = {Harnessing Data Movement in Virtual Clusters for In-Situ Execution},
author = {Huang, Dan and Liu, Qing and Klasky, Scott A. and Wang, Jun and Choi, Jong Youl and Logan, Jeremy and Podhorszki, Norbert},
abstractNote = {As a result of increasing data volume and velocity, Big Data science at exascale has shifted towards the in-situ paradigm, where large scale simulations run concurrently alongside data analytics. With in-situ, data generated from simulations can be processed while still in memory, thereby avoiding the slow storage bottleneck. However, running simulations and analytics together on shared resources will likely result in substantial contention if left unmanaged, as demonstrated in this work, leading to much reduced efficiency of simulations and analytics. Recently, virtualization technologies such as Linux containers have been widely applied to data centers and physical clusters to provide highly efficient and elastic resource provisioning for consolidated workloads including scientific simulations and data analytics. In this paper, we investigate to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications in virtual clusters. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Although this can be an effective technique, the naïve usage of network virtualization can lead to performance degradation for bursty asynchronous transmissions within an MPI job. Here, we analyze and resolve this performance degradation in virtual clusters.},
doi = {10.1109/TPDS.2018.2867879},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = 3,
volume = 30,
place = {United States},
year = {Thu Aug 30 00:00:00 EDT 2018},
month = {Thu Aug 30 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

Save / Share:

Works referencing / citing this record:

The role of machine learning in scientific workflows
journal, May 2019

  • Deelman, Ewa; Mandal, Anirban; Jiang, Ming
  • The International Journal of High Performance Computing Applications, Vol. 33, Issue 6
  • DOI: 10.1177/1094342019852127