Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications
Abstract
Summary The growth in size of networked high performance computers along with novel accelerator‐based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub‐optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter‐task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm‐based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. Application benchmarks after task reordering throughmore »
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1224742
- Alternate Identifier(s):
- OSTI ID: 1400703
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Concurrency and Computation. Practice and Experience
- Additional Journal Information:
- Journal Volume: 27; Journal Issue: 17; Journal ID: ISSN 1532-0626
- Publisher:
- Wiley
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Sankaran, Ramanan, Angel, Jordan, and Brown, W. Michael. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications. United States: N. p., 2015.
Web. doi:10.1002/cpe.3457.
Sankaran, Ramanan, Angel, Jordan, & Brown, W. Michael. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications. United States. https://doi.org/10.1002/cpe.3457
Sankaran, Ramanan, Angel, Jordan, and Brown, W. Michael. Wed .
"Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications". United States. https://doi.org/10.1002/cpe.3457. https://www.osti.gov/servlets/purl/1224742.
@article{osti_1224742,
title = {Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications},
author = {Sankaran, Ramanan and Angel, Jordan and Brown, W. Michael},
abstractNote = {Summary The growth in size of networked high performance computers along with novel accelerator‐based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub‐optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter‐task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm‐based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. Application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, thereby enabling the applications to achieve better time to solution and scalability on Titan during production. Copyright © 2015 John Wiley & Sons, Ltd.},
doi = {10.1002/cpe.3457},
journal = {Concurrency and Computation. Practice and Experience},
number = 17,
volume = 27,
place = {United States},
year = {Wed Apr 08 00:00:00 EDT 2015},
month = {Wed Apr 08 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
Greedy Randomized Adaptive Search Procedures
journal, March 1995
- Feo, Thomas A.; Resende, Mauricio G. C.
- Journal of Global Optimization, Vol. 6, Issue 2
Rupture mechanism of liquid crystal thin films realized by large-scale molecular simulations
journal, January 2014
- Nguyen, Trung Dac; Carrillo, Jan-Michael Y.; Matheson, Michael A.
- Nanoscale, Vol. 6, Issue 6
An Evaluation of Molecular Dynamics Performance on the Hybrid Cray XK6 Supercomputer
journal, January 2012
- Michael Brown, W.; Nguyen, Trung D.; Fuentes-Cabrera, Miguel
- Procedia Computer Science, Vol. 9
Simulation of laminar and turbulent impeller stirred tanks using immersed boundary method and large eddy simulation technique in multi-block curvilinear geometries
journal, March 2007
- Tyagi, Mayank; Roy, Somnath; Harvey III, Albert D.
- Chemical Engineering Science, Vol. 62, Issue 5
Heuristic technique for processor and link assignment in multicomputers
journal, March 1991
- Bollinger, S. W.; Midkiff, S. F.
- IEEE Transactions on Computers, Vol. 40, Issue 3
Implementing molecular dynamics on hybrid high performance computers – short range forces
journal, April 2011
- Brown, W. Michael; Wang, Peng; Plimpton, Steven J.
- Computer Physics Communications, Vol. 182, Issue 4
A randomized heuristics for the mapping problem: The genetic approach
journal, October 1992
- Chockalingam, T.; Arunkumar, S.
- Parallel Computing, Vol. 18, Issue 10
On the Mapping Problem
journal, March 1981
- Bokhari,
- IEEE Transactions on Computers, Vol. C-30, Issue 3
Genetic algorithm based heuristics for the mapping problem
journal, January 1995
- Chockalingam, T.; Arunkumar, S.
- Computers & Operations Research, Vol. 22, Issue 1
Large eddy simulation of turbulence-chemistry interactions in reacting flows
journal, September 2006
- Oefelein, J. C.; Drozda, T. G.; Sankaran, V.
- Journal of Physics: Conference Series, Vol. 46
Parallel search for combinatorial optimization: Genetic algorithms, simulated annealing, tabu search and GRASP
book, January 1995
- Pardalos, P. M.; Pitsoulis, L.; Mavridou, T.
- Parallel Algorithms for Irregularly Structured Problems
Noncontiguous processor allocation algorithms for mesh-connected multicomputers
journal, July 1997
- Lo, V.; Windisch, K. J.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 7
An approach to mapping parallel programs on hypercube multiprocessors
conference, January 1999
- Jose, A.
- Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99
Optimization-based mapping framework for parallel applications
journal, October 2011
- Pascual, Jose A.; Miguel-Alonso, Jose; Lozano, Jose A.
- Journal of Parallel and Distributed Computing, Vol. 71, Issue 10
A survey for the quadratic assignment problem
journal, January 2007
- Loiola, Eliane Maria; de Abreu, Nair Maria Maia; Boaventura-Netto, Paulo Oswaldo
- European Journal of Operational Research, Vol. 176, Issue 2
Strategies to Map Parallel Applications onto Meshes
book, January 2010
- Pascual, Jose A.; Miguel-Alonso, Jose; Lozano, Jose A.
- Advances in Intelligent and Soft Computing
Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations
journal, November 2000
- Kennedy, Christopher A.; Carpenter, Mark H.; Lewis, R. Michael
- Applied Numerical Mathematics, Vol. 35, Issue 3
Optimization by Simulated Annealing
journal, May 1983
- Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P.
- Science, Vol. 220, Issue 4598
New insights into the dynamics and morphology of P3HT:PCBM active layers in bulk heterojunctions
journal, January 2013
- Carrillo, Jan-Michael Y.; Kumar, Rajeev; Goswami, Monojoy
- Physical Chemistry Chemical Physics, Vol. 15, Issue 41
Fast Parallel Algorithms for Short-Range Molecular Dynamics
journal, March 1995
- Plimpton, Steve
- Journal of Computational Physics, Vol. 117, Issue 1
Task mapping stencil computations for non-contiguous allocations
conference, January 2014
- Leung, Vitus J.; Bunde, David P.; Ebbers, Jonathan
- Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
conference, November 2012
- Levesque, John M.; Sankaran, Ramanan; Grout, Ray
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies
conference, September 2011
- Bhatele, Abhinav; Kale, Laxmikant V.
- Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications
Communication patterns and allocation strategies
conference, January 2004
- Bunde, D. P.; Leung, V. J.; Mache, J.
- 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.
Contention-aware node allocation policy for high-performance capacity systems
conference, January 2012
- Jokanovic, Ana; Minkenberg, Cyriel; Sancho, Jose Carlos
- Proceedings of the 2012 Interconnection Network Architecture on On-Chip, Multi-Chip Workshop - INA-OCMC '12
Cray Cascade: A scalable HPC system based on a Dragonfly network
conference, November 2012
- Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Generic topology mapping strategies for large-scale parallel architectures
conference, January 2011
- Hoefler, Torsten; Snir, Marc
- Proceedings of the international conference on Supercomputing - ICS '11
Works referencing / citing this record:
Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers
conference, March 2016
- Sreepathi, Sarat; D'Azevedo, Ed; Philip, Bobby
- ICPE'16: ACM/SPEC International Conference on Performance Engineering, Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering