Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

doi:10.1002/cpe.3457

Title: Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

Journal Article · Wed Apr 08 00:00:00 EDT 2015 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.3457· OSTI ID:1224742

^[1]; Angel, Jordan ^[1]; Brown, W. Michael ^[1]

Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

Summary The growth in size of networked high performance computers along with novel accelerator‐based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub‐optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter‐task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm‐based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. Application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, thereby enabling the applications to achieve better time to solution and scalability on Titan during production. Copyright © 2015 John Wiley & Sons, Ltd.

View Accepted Manuscript (DOE)

View Accepted Manuscript (Publisher)

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science (SC)

Grant/Contract Number:: AC05-00OR22725

OSTI ID:: 1224742

Alternate ID(s):: OSTI ID: 1400703

Journal Information:: Concurrency and Computation. Practice and Experience, Vol. 27, Issue 17; ISSN 1532-0626

Publisher:: WileyCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 1 work

Citation information provided by
Web of Science

References (27)

Greedy Randomized Adaptive Search Procedures Feo, Thomas A.; Resende, Mauricio G. C. Journal of Global Optimization, Vol. 6, Issue 2 https://doi.org/10.1007/BF01096763	journal	March 1995
Rupture mechanism of liquid crystal thin films realized by large-scale molecular simulations Nguyen, Trung Dac; Carrillo, Jan-Michael Y.; Matheson, Michael A. Nanoscale, Vol. 6, Issue 6 https://doi.org/10.1039/C3NR05413F	journal	January 2014
An Evaluation of Molecular Dynamics Performance on the Hybrid Cray XK6 Supercomputer Michael Brown, W.; Nguyen, Trung D.; Fuentes-Cabrera, Miguel Procedia Computer Science, Vol. 9 https://doi.org/10.1016/j.procs.2012.04.020	journal	January 2012
Simulation of laminar and turbulent impeller stirred tanks using immersed boundary method and large eddy simulation technique in multi-block curvilinear geometries Tyagi, Mayank; Roy, Somnath; Harvey III, Albert D. Chemical Engineering Science, Vol. 62, Issue 5 https://doi.org/10.1016/j.ces.2006.11.017	journal	March 2007
Heuristic technique for processor and link assignment in multicomputers Bollinger, S. W.; Midkiff, S. F. IEEE Transactions on Computers, Vol. 40, Issue 3 https://doi.org/10.1109/12.76410	journal	March 1991
Implementing molecular dynamics on hybrid high performance computers – short range forces Brown, W. Michael; Wang, Peng; Plimpton, Steven J. Computer Physics Communications, Vol. 182, Issue 4 https://doi.org/10.1016/j.cpc.2010.12.021	journal	April 2011
A randomized heuristics for the mapping problem: The genetic approach Chockalingam, T.; Arunkumar, S. Parallel Computing, Vol. 18, Issue 10 https://doi.org/10.1016/0167-8191(92)90062-C	journal	October 1992
On the Mapping Problem IEEE Transactions on Computers, Vol. C-30, Issue 3 https://doi.org/10.1109/TC.1981.1675756	journal	March 1981
Genetic algorithm based heuristics for the mapping problem Chockalingam, T.; Arunkumar, S. Computers & Operations Research, Vol. 22, Issue 1 https://doi.org/10.1016/0305-0548(94)P2435-7	journal	January 1995
Large eddy simulation of turbulence-chemistry interactions in reacting flows Oefelein, J. C.; Drozda, T. G.; Sankaran, V. Journal of Physics: Conference Series, Vol. 46 https://doi.org/10.1088/1742-6596/46/1/002	journal	September 2006
Parallel search for combinatorial optimization: Genetic algorithms, simulated annealing, tabu search and GRASP Pardalos, P. M.; Pitsoulis, L.; Mavridou, T. Parallel Algorithms for Irregularly Structured Problems https://doi.org/10.1007/3-540-60321-2_26	book	January 1995
Noncontiguous processor allocation algorithms for mesh-connected multicomputers Lo, V.; Windisch, K. J. IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 7 https://doi.org/10.1109/71.598346	journal	July 1997
An approach to mapping parallel programs on hypercube multiprocessors Jose, A. Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99 https://doi.org/10.1109/EMPDP.1999.746675	conference	January 1999
Optimization-based mapping framework for parallel applications Pascual, Jose A.; Miguel-Alonso, Jose; Lozano, Jose A. Journal of Parallel and Distributed Computing, Vol. 71, Issue 10 https://doi.org/10.1016/j.jpdc.2011.06.005	journal	October 2011
A survey for the quadratic assignment problem Loiola, Eliane Maria; de Abreu, Nair Maria Maia; Boaventura-Netto, Paulo Oswaldo European Journal of Operational Research, Vol. 176, Issue 2 https://doi.org/10.1016/j.ejor.2005.09.032	journal	January 2007
Strategies to Map Parallel Applications onto Meshes Pascual, Jose A.; Miguel-Alonso, Jose; Lozano, Jose A. Advances in Intelligent and Soft Computing https://doi.org/10.1007/978-3-642-14883-5_26	book	January 2010
Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations Kennedy, Christopher A.; Carpenter, Mark H.; Lewis, R. Michael Applied Numerical Mathematics, Vol. 35, Issue 3 https://doi.org/10.1016/S0168-9274(99)00141-5	journal	November 2000
Optimization by Simulated Annealing Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P. Science, Vol. 220, Issue 4598 https://doi.org/10.1126/science.220.4598.671	journal	May 1983
New insights into the dynamics and morphology of P3HT:PCBM active layers in bulk heterojunctions Carrillo, Jan-Michael Y.; Kumar, Rajeev; Goswami, Monojoy Physical Chemistry Chemical Physics, Vol. 15, Issue 41 https://doi.org/10.1039/C3CP53271B	journal	January 2013
Fast Parallel Algorithms for Short-Range Molecular Dynamics Plimpton, Steve Journal of Computational Physics, Vol. 117, Issue 1 https://doi.org/10.1006/jcph.1995.1039	journal	March 1995
Task mapping stencil computations for non-contiguous allocations Leung, Vitus J.; Bunde, David P.; Ebbers, Jonathan Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14 https://doi.org/10.1145/2555243.2555277	conference	January 2014
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond Levesque, John M.; Sankaran, Ramanan; Grout, Ray 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.69	conference	November 2012
Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies Bhatele, Abhinav; Kale, Laxmikant V. Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications https://doi.org/10.1109/HPCC.2011.109	conference	September 2011
Communication patterns and allocation strategies Bunde, D. P.; Leung, V. J.; Mache, J. 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. https://doi.org/10.1109/IPDPS.2004.1303307	conference	January 2004
Contention-aware node allocation policy for high-performance capacity systems Jokanovic, Ana; Minkenberg, Cyriel; Sancho, Jose Carlos Proceedings of the 2012 Interconnection Network Architecture on On-Chip, Multi-Chip Workshop - INA-OCMC '12 https://doi.org/10.1145/2107763.2107765	conference	January 2012
Cray Cascade: A scalable HPC system based on a Dragonfly network Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.39	conference	November 2012
Generic topology mapping strategies for large-scale parallel architectures Hoefler, Torsten; Snir, Marc Proceedings of the international conference on Supercomputing - ICS '11 https://doi.org/10.1145/1995896.1995909	conference	January 2011

Cited By (1)

Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers Sreepathi, Sarat; D'Azevedo, Ed; Philip, Bobby ICPE'16: ACM/SPEC International Conference on Performance Engineering, Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering https://doi.org/10.1145/2851553.2851575	conference	March 2016

Similar Records

Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1224742

Sreepathi, Sarat; D'Azevedo, Ed; Philip, Bobby; +1 more

Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1224742

Sreepathi, Sarat; D'Azevedo, Eduardo; Philip, Bobby; +1 more

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1224742

Shen, Xipeng

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

Citation Formats

References (27)

Cited By (1)

Similar Records

Related Subjects