skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

Abstract

Summary The growth in size of networked high performance computers along with novel accelerator‐based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub‐optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter‐task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm‐based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. Application benchmarks after task reordering throughmore » genetic algorithm show a significant improvement in performance and reduction in variability, thereby enabling the applications to achieve better time to solution and scalability on Titan during production. Copyright © 2015 John Wiley & Sons, Ltd.« less

Authors:
ORCiD logo [1];  [1];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1224742
Alternate Identifier(s):
OSTI ID: 1400703
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Volume: 27; Journal Issue: 17; Journal ID: ISSN 1532-0626
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Sankaran, Ramanan, Angel, Jordan, and Brown, W. Michael. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications. United States: N. p., 2015. Web. doi:10.1002/cpe.3457.
Sankaran, Ramanan, Angel, Jordan, & Brown, W. Michael. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications. United States. https://doi.org/10.1002/cpe.3457
Sankaran, Ramanan, Angel, Jordan, and Brown, W. Michael. 2015. "Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications". United States. https://doi.org/10.1002/cpe.3457. https://www.osti.gov/servlets/purl/1224742.
@article{osti_1224742,
title = {Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications},
author = {Sankaran, Ramanan and Angel, Jordan and Brown, W. Michael},
abstractNote = {Summary The growth in size of networked high performance computers along with novel accelerator‐based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub‐optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter‐task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm‐based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. Application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, thereby enabling the applications to achieve better time to solution and scalability on Titan during production. Copyright © 2015 John Wiley & Sons, Ltd.},
doi = {10.1002/cpe.3457},
url = {https://www.osti.gov/biblio/1224742}, journal = {Concurrency and Computation. Practice and Experience},
issn = {1532-0626},
number = 17,
volume = 27,
place = {United States},
year = {Wed Apr 08 00:00:00 EDT 2015},
month = {Wed Apr 08 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Greedy Randomized Adaptive Search Procedures
journal, March 1995


Rupture mechanism of liquid crystal thin films realized by large-scale molecular simulations
journal, January 2014


An Evaluation of Molecular Dynamics Performance on the Hybrid Cray XK6 Supercomputer
journal, January 2012


Heuristic technique for processor and link assignment in multicomputers
journal, March 1991


Implementing molecular dynamics on hybrid high performance computers – short range forces
journal, April 2011


A randomized heuristics for the mapping problem: The genetic approach
journal, October 1992


On the Mapping Problem
journal, March 1981


Genetic algorithm based heuristics for the mapping problem
journal, January 1995


Large eddy simulation of turbulence-chemistry interactions in reacting flows
journal, September 2006


Parallel search for combinatorial optimization: Genetic algorithms, simulated annealing, tabu search and GRASP
book, January 1995


Noncontiguous processor allocation algorithms for mesh-connected multicomputers
journal, July 1997


An approach to mapping parallel programs on hypercube multiprocessors
conference, January 1999


Optimization-based mapping framework for parallel applications
journal, October 2011


A survey for the quadratic assignment problem
journal, January 2007


Strategies to Map Parallel Applications onto Meshes
book, January 2010


Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations
journal, November 2000


Optimization by Simulated Annealing
journal, May 1983


New insights into the dynamics and morphology of P3HT:PCBM active layers in bulk heterojunctions
journal, January 2013


Fast Parallel Algorithms for Short-Range Molecular Dynamics
journal, March 1995


Task mapping stencil computations for non-contiguous allocations
conference, January 2014

  • Leung, Vitus J.; Bunde, David P.; Ebbers, Jonathan
  • Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14
  • https://doi.org/10.1145/2555243.2555277

Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
conference, November 2012

  • Levesque, John M.; Sankaran, Ramanan; Grout, Ray
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2012.69

Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies
conference, September 2011


Communication patterns and allocation strategies
conference, January 2004


Contention-aware node allocation policy for high-performance capacity systems
conference, January 2012

  • Jokanovic, Ana; Minkenberg, Cyriel; Sancho, Jose Carlos
  • Proceedings of the 2012 Interconnection Network Architecture on On-Chip, Multi-Chip Workshop - INA-OCMC '12
  • https://doi.org/10.1145/2107763.2107765

Cray Cascade: A scalable HPC system based on a Dragonfly network
conference, November 2012

  • Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2012.39

Generic topology mapping strategies for large-scale parallel architectures
conference, January 2011


Works referencing / citing this record:

Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers
conference, March 2016

  • Sreepathi, Sarat; D'Azevedo, Ed; Philip, Bobby
  • ICPE'16: ACM/SPEC International Conference on Performance Engineering, Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering
  • https://doi.org/10.1145/2851553.2851575