skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evaluating Quality of Service Traffic Classes on the Megafly Network

Abstract

An emerging trend in High Performance Computing (HPC) systems that use hierarchical topologies (such as dragonfly) is that the applications are increasingly exhibiting high run-to-run performance variability. This poses a significant challenge for application developers, job schedulers, and system maintainers. One approach to address the performance variability is to use newly proposed network topologies such as megafly (or dragonfly+) that offer increased path diversity compared to a traditional fully connected dragonfly. Yet another approach is to use quality of service (QoS) traffic classes that ensure bandwidth guarantees. In this work, we select HPC application workloads that have exhibited performance variability on current 2-D dragonfly systems. We evaluate the baseline performance expectations of these workloads on megafly and 1-D dragonfly network models with comparably similar network configurations. Our results show that the megafly network, despite using fewer virtual channels (VCs) for deadlock avoidance than a dragonfly, performs as well as a fully connected 1-D dragonfly network. We then exploit the fact that megafly networks require fewer VCs to incorporate QoS traffic classes. We use bandwidth capping and traffic differentiation techniques to introduce multiple traffic classes in megafly networks. In some cases, our results show that QoS can completely mitigate application performancemore » variability while causing minimal slowdown to the background network traffic.« less

Authors:
; ; ; ; ; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC); USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1574767
DOE Contract Number:  
AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Journal Volume: 11501; Conference: 2019 ISC High Performance, 06/16/19 - 06/20/19, Frankfurt, DE
Country of Publication:
United States
Language:
English

Citation Formats

Mubarak, Misbah, McGlohon, Neil, Musleh, Malek, Borch, Eric, Ross, Robert B., Huggahalli, Ram, Chunduri, Sudheer, Parker, Scott, Carothers, Christopher D., and Kumaran, Kalyan. Evaluating Quality of Service Traffic Classes on the Megafly Network. United States: N. p., 2019. Web. doi:10.1007/978-3-030-20656-7_1.
Mubarak, Misbah, McGlohon, Neil, Musleh, Malek, Borch, Eric, Ross, Robert B., Huggahalli, Ram, Chunduri, Sudheer, Parker, Scott, Carothers, Christopher D., & Kumaran, Kalyan. Evaluating Quality of Service Traffic Classes on the Megafly Network. United States. doi:10.1007/978-3-030-20656-7_1.
Mubarak, Misbah, McGlohon, Neil, Musleh, Malek, Borch, Eric, Ross, Robert B., Huggahalli, Ram, Chunduri, Sudheer, Parker, Scott, Carothers, Christopher D., and Kumaran, Kalyan. Tue . "Evaluating Quality of Service Traffic Classes on the Megafly Network". United States. doi:10.1007/978-3-030-20656-7_1.
@article{osti_1574767,
title = {Evaluating Quality of Service Traffic Classes on the Megafly Network},
author = {Mubarak, Misbah and McGlohon, Neil and Musleh, Malek and Borch, Eric and Ross, Robert B. and Huggahalli, Ram and Chunduri, Sudheer and Parker, Scott and Carothers, Christopher D. and Kumaran, Kalyan},
abstractNote = {An emerging trend in High Performance Computing (HPC) systems that use hierarchical topologies (such as dragonfly) is that the applications are increasingly exhibiting high run-to-run performance variability. This poses a significant challenge for application developers, job schedulers, and system maintainers. One approach to address the performance variability is to use newly proposed network topologies such as megafly (or dragonfly+) that offer increased path diversity compared to a traditional fully connected dragonfly. Yet another approach is to use quality of service (QoS) traffic classes that ensure bandwidth guarantees. In this work, we select HPC application workloads that have exhibited performance variability on current 2-D dragonfly systems. We evaluate the baseline performance expectations of these workloads on megafly and 1-D dragonfly network models with comparably similar network configurations. Our results show that the megafly network, despite using fewer virtual channels (VCs) for deadlock avoidance than a dragonfly, performs as well as a fully connected 1-D dragonfly network. We then exploit the fact that megafly networks require fewer VCs to incorporate QoS traffic classes. We use bandwidth capping and traffic differentiation techniques to introduce multiple traffic classes in megafly networks. In some cases, our results show that QoS can completely mitigate application performance variability while causing minimal slowdown to the background network traffic.},
doi = {10.1007/978-3-030-20656-7_1},
journal = {},
issn = {0302--9743},
number = ,
volume = 11501,
place = {United States},
year = {2019},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: