Accelerating Parallel Applications in Cloud Platforms via Adaptive Time-Slice Control

Fan, Hao; Wu, Song; Zhao, Xinyu; Xie, Zhenjiang; Di, Sheng; Xiao, Jiang; Yu, Chen; Jin, Hai

doi:10.1109/tc.2020.2999619

Title: Accelerating Parallel Applications in Cloud Platforms via Adaptive Time-Slice Control

Journal Article · Wed Jun 03 00:00:00 EDT 2020 · IEEE Transactions on Computers

DOI:https://doi.org/10.1109/tc.2020.2999619· OSTI ID:1863258

Fan, Hao ^[1];

^[1]; Zhao, Xinyu ^[2]; Xie, Zhenjiang ^[3];

^[4];

^[1];

^[1]

Huazhong Univ. of Science and Technology, Wuhan (China)
Tencent Group, Guangdong (China)
Alibaba Group, Zhejiang (China)
Argonne National Lab. (ANL), Lemont, IL (United States)

Cloud platforms can provide flexible and cost-effective environments for parallel applications. However, the resource over-commitment issues, i.e., cloud providers often provide much more executable virtual CPUs than available physical CPUs, still impede the synchronization operations of parallel applications, causing severe performance degradation. Existing methods optimize parallel applications by promoting the priorities of involved VMs. They cannot fully explore the performance of parallel applications, because they ignore the time-slice requirements of different phases of parallel applications. Furthermore, non-parallel applications experience unsatisfied performance because of low scheduling priorities. Given empirical analysis on time-slices of virtual machines (VMs), we find that shortening time-slices can mitigate synchronization overhead which incurs during communication phases, while over-short time-slices cause frequent cache misses in computation phases. Accordingly, we propose an Adaptive Time-slice Control (ATC) mechanism. ATC first detects the phases of parallel applications based on lock latency or cache misses. Then, ATC shortens time-slices during communication phases and prolongs time-slices during computation phases for parallel applications, and sets a uniform time-slice for non-parallel applications. Finally, we evaluate ATC using seven well-known benchmarks with 25+ applications. Experiments show that ATC obtains 1.5-75x performance gain for running parallel applications than state-of-the-art solutions, with nearly unaffected impact on non-parallel applications.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Argonne National Lab. (ANL), Argonne, IL (United States)

Sponsoring Organization:: USDOE Office of Science (SC); National Key Research and Development Program of China; National Science Foundation of China

Grant/Contract Number:: AC02-06CH11357; 2018YFB1004805; 61872155; 61732010; 2019aea171

OSTI ID:: 1863258

Journal Information:: IEEE Transactions on Computers, Vol. 70, Issue 7; ISSN 0018-9340

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

References (26)

Time-Sharing Parallel Applications with Performance Isolation and Control Lin, Bin; Sundararaj, Ananth I.; Dinda, Peter A. Fourth International Conference on Autonomic Computing (ICAC'07) https://doi.org/10.1109/ICAC.2007.39	conference	June 2007
Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications Zhai, Yan; Liu, Mingliang; Zhai, Jidong State of the Practice Reports on - SC '11 https://doi.org/10.1145/2063348.2063363	conference	January 2011
Performance implications of virtualizing multicore cluster machines Ranadive, Adit; Kesavan, Mukil; Gavrilovska, Ada Proceedings of the 2nd workshop on System-level virtualization for high performance computing - HPCVirt '08 https://doi.org/10.1145/1435452.1435453	conference	January 2008
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems Arpaci-Dusseau, Andrea Carol ACM Transactions on Computer Systems, Vol. 19, Issue 3 https://doi.org/10.1145/380749.380764	journal	August 2001
The hybrid scheduling framework for virtual machine systems Weng, Chuliang; Wang, Zhigang; Li, Minglu Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments - VEE '09 https://doi.org/10.1145/1508293.1508309	conference	January 2009
Dynamic adaptive scheduling for virtual machines Weng, Chuliang; Liu, Qian; Yu, Lei Proceedings of the 20th international symposium on High performance distributed computing - HPDC '11 https://doi.org/10.1145/1996130.1996163	conference	January 2011
A bridging model for parallel computation Valiant, Leslie G. Communications of the ACM, Vol. 33, Issue 8 https://doi.org/10.1145/79173.79181	journal	August 1990
Dynamic Acceleration of Parallel Applications in Cloud Platforms by Adaptive Time-Slice Control Wu, Song; Xie, Zhenjiang; Chen, Haibao 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2016.77	conference	May 2016
vScale: automatic and efficient processor scaling for SMP virtual machines Cheng, Luwei; Rao, Jia; Lau, Francis C. M. EuroSys '16: Eleventh EuroSys Conference 2016, Proceedings of the Eleventh European Conference on Computer Systems https://doi.org/10.1145/2901318.2901321	conference	April 2016
Demand-based coordinated scheduling for SMP VMs Kim, Hwanju; Kim, Sangwook; Jeong, Jinkyu Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13 https://doi.org/10.1145/2451116.2451156	conference	January 2013
Synchronization-Aware Scheduling for Virtual Clusters in Cloud Wu, Song; Chen, Haibao; Di, Sheng IEEE Transactions on Parallel and Distributed Systems, Vol. 26, Issue 10 https://doi.org/10.1109/TPDS.2014.2359017	journal	October 2015
The Scalasca performance toolset architecture Geimer, Markus; Wolf, Felix; Wylie, Brian J. N. Concurrency and Computation: Practice and Experience https://doi.org/10.1002/cpe.1556	journal	January 2010
vSlicer: latency-aware virtual machine scheduling via differentiated-frequency CPU slicing Xu, Cong; Gamage, Sahan; Rao, Pawan N. Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing - HPDC '12 https://doi.org/10.1145/2287076.2287080	conference	January 2012
Xen and the art of virtualization Barham, Paul; Dragovic, Boris; Fraser, Keir Proceedings of the nineteenth ACM symposium on Operating systems principles - SOSP '03 https://doi.org/10.1145/945461.945462	conference	January 2003
The impact of management operations on the virtualized datacenter Soundararajan, Vijayaraghavan; Anderson, Jennifer M. Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10 https://doi.org/10.1145/1815961.1816003	conference	January 2010
The NAS parallel benchmarks---summary and preliminary results Bailey, D. H.; Schreiber, R. S.; Simon, H. D. Proceedings of the 1991 ACM/IEEE conference on Supercomputing - Supercomputing '91 https://doi.org/10.1145/125826.125925	conference	January 1991
The PARSEC benchmark suite: characterization and architectural implications Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08 https://doi.org/10.1145/1454115.1454128	conference	January 2008
Is co-scheduling too expensive for SMP VMs? Sukwong, Orathai; Kim, Hyong S. Proceedings of the sixth conference on Computer systems - EuroSys '11 https://doi.org/10.1145/1966445.1966469	conference	January 2011
Towards fair and efficient SMP virtual machine scheduling Rao, Jia; Zhou, Xiaobo Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14 https://doi.org/10.1145/2555243.2555246	conference	January 2014
Dynamic Switching-Frequency Scaling: Scheduling Overcommitted Domains in Xen VMM Chen, Huacai; Jin, Hai; Hu, Kan 2010 39th International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/ICPP.2010.36	conference	September 2010
Flexible resource allocation for reliable virtual cluster computing systems Hacker, Thomas J.; Mahadik, Kanak Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063448	conference	January 2011
Threads vs. caches: Modeling the behavior of parallel workloads Guz, Zvika; Itzhak, Oved; Keidar, Idit 2010 IEEE International Conference on Computer Design (ICCD 2010) https://doi.org/10.1109/ICCD.2010.5647747	conference	October 2010
Supporting Overcommitted Virtual Machines through Hardware Spin Detection Chakraborty, Koushik; Wells, Philip M.; Sohi, Gurindar S. IEEE Transactions on Parallel and Distributed Systems, Vol. 23, Issue 2 https://doi.org/10.1109/TPDS.2011.143	journal	February 2012
Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference Zhao, Yong; Rao, Jia; Yi, Qing PACT '16: International Conference on Parallel Architectures and Compilation, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation https://doi.org/10.1145/2967938.2967939	conference	September 2016
Micro-Sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Ahn, Jeongseob; Park, Chang Hyun; Huh, Jaehyuk 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) https://doi.org/10.1109/MICRO.2014.49	conference	December 2014
Perfctr-Xen: a framework for performance counter virtualization Nikolaev, Ruslan; Back, Godmar ACM SIGPLAN Notices, Vol. 46, Issue 7 https://doi.org/10.1145/2007477.1952687	journal	July 2011

Similar Records

Center for Technology for Advanced Scientific Componet Software (TASCS)

Technical Report · Sun Oct 31 00:00:00 EDT 2010 · OSTI ID:1863258

Govindaraju, Madhusudhan

HUNTing the Overlap

Conference · Fri Jul 08 00:00:00 EDT 2005 · OSTI ID:1863258

Iancu, Costin; Parry, Husbands; Hargrove, Paul

Online anomaly detection for multi‐source VMware using a distributed streaming framework

Journal Article · Mon Jan 11 00:00:00 EST 2016 · Software, Practice and Experience · OSTI ID:1863258

Solaimani, Mohiuddin; Iftekhar, Mohammed; Khan, Latifur; +3 more

Related Subjects

97 MATHEMATICS AND COMPUTING
LHP problem
Parallel application
cache miss rate
cloud platform
lock latency
synchronization overhead

Title: Accelerating Parallel Applications in Cloud Platforms via Adaptive Time-Slice Control

Citation Formats

References (26)

Similar Records

Related Subjects