Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication.

Moon, Gordon Euhyun; Kwon, Hyoukjun; Jeong, Geonhwa; Chatarasi, Prasanth; Rajamanickam, Sivasankaran; Krishna, Tushar

doi:10.1109/TPDS.2021.3104240

Title: Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication.

Journal Article · Wed Aug 11 00:00:00 EDT 2021 · IEEE Transactions on Parallel and Distributed Systems

DOI:https://doi.org/10.1109/TPDS.2021.3104240· OSTI ID:1820407

Moon, Gordon Euhyun ^[1]; Kwon, Hyoukjun ^[2]; Jeong, Geonhwa ^[2]; Chatarasi, Prasanth ^[2]; Rajamanickam, Sivasankaran ^[3]; Krishna, Tushar ^[2]

Korea Aerospace Univ., Gyeonggi (Korea, Republic of)
Georgia Inst. of Technology, Atlanta, GA (United States)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Finally, our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: NA0003525

OSTI ID:: 1820407

Report Number(s):: SAND-2021-9925J; 698422

Journal Information:: IEEE Transactions on Parallel and Distributed Systems, Vol. 33, Issue 4; ISSN 1045-9219

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

References (27)

A coordinated tiling and batching framework for efficient GEMM on GPUs Li, Xiuhong; Liang, Yun; Yan, Shengen PPoPP '19: 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3293883.3295734	conference	February 2019
Self-Adapting Linear Algebra Algorithms and Software Demmel, J.; Dongarra, J.; Eijkhout, V. Proceedings of the IEEE, Vol. 93, Issue 2 https://doi.org/10.1109/JPROC.2004.840848	journal	February 2005
dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators Dave, Shail; Kim, Youngbin; Avancha, Sasikanth ACM Transactions on Embedded Computing Systems, Vol. 18, Issue 5s https://doi.org/10.1145/3358198	journal	October 2019
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators Yang, Xuan; Gao, Mingyu; Liu, Qiaoyi ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3373376.3378514	conference	March 2020
A Hardware–Software Blueprint for Flexible Deep Learning Specialization Moreau, Thierry; Chen, Tianqi; Vega, Luis IEEE Micro, Vol. 39, Issue 5 https://doi.org/10.1109/MM.2019.2928962	journal	September 2019
Domain-specific library generation for parallel software and hardware platforms Franchetti, Franz; Voronenko, Yevgen; Milder, Peter A. 2008 IEEE International Parallel & Distributed Processing Symposium, 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536398	conference	April 2008
Automated empirical optimizations of software and the ATLAS project Clint Whaley, R.; Petitet, Antoine; Dongarra, Jack J. Parallel Computing, Vol. 27, Issue 1-2 https://doi.org/10.1016/S0167-8191(00)00087-9	journal	January 2001
A high-performance, low-power linear algebra core Pedram, Ardavan; Gerstlauer, Andreas; Geijn, Robert A. van de 2011 IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors https://doi.org/10.1109/ASAP.2011.6043234	conference	September 2011
Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture Emani, Murali; Vishwanath, Venkatram; Adams, Corey Computing in Science & Engineering, Vol. 23, Issue 2 https://doi.org/10.1109/MCSE.2021.3057203	journal	March 2021
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark Kågström, Bo; Ling, Per; van Loan, Charles ACM Transactions on Mathematical Software, Vol. 24, Issue 3 https://doi.org/10.1145/292395.292412	journal	September 1998
High-performance implementation of the level-3 BLAS Goto, Kazushige; Van De Geijn, Robert ACM Transactions on Mathematical Software, Vol. 35, Issue 1 https://doi.org/10.1145/1377603.1377607	journal	July 2008
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training Qin, Eric; Samajdar, Ananda; Kwon, Hyoukjun 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA47549.2020.00015	conference	February 2020
A survey of direct methods for sparse linear systems Davis, Timothy A.; Rajamanickam, Sivasankaran; Sid-Lakhdar, Wissam M. Acta Numerica, Vol. 25 https://doi.org/10.1017/S0962492916000076	journal	May 2016
Rethinking NoCs for Spatial Neural Network Accelerators Kwon, Hyoukjun; Samajdar, Ananda; Krishna, Tushar NOCS '17: International Symposium on Networks-on-Chip, Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip https://doi.org/10.1145/3130218.3130230	conference	October 2017
Understanding the Impact of On-chip Communication on DNN Accelerator Performance Guirado, Robert; Kwon, Hyoukjun; Alarcon, Eduard 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS) https://doi.org/10.1109/ICECS46596.2019.8964858	conference	November 2019
A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference Fleischer, Bruce; Shukla, Sunil; Ziegler, Matthew 2018 IEEE Symposium on VLSI Circuits https://doi.org/10.1109/VLSIC.2018.8502276	conference	June 2018
Timeloop: A Systematic Approach to DNN Accelerator Evaluation Parashar, Angshuman; Raina, Priyanka; Shao, Yakun Sophia 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) https://doi.org/10.1109/ISPASS.2019.00042	conference	March 2019
ShiDianNao: shifting vision processing closer to the sensor Du, Zidong; Fasthuber, Robert; Chen, Tianshi ISCA '15: The 42nd Annual International Symposium on Computer Architecture, Proceedings of the 42nd Annual International Symposium on Computer Architecture https://doi.org/10.1145/2749469.2750389	conference	June 2015
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks Chen, Yu-Hsin; Emer, Joel; Sze, Vivienne ACM SIGARCH Computer Architecture News, Vol. 44, Issue 3 https://doi.org/10.1145/3007787.3001177	journal	October 2016
mRNA: Enabling Efficient Mapping Space Exploration for a Reconfiguration Neural Accelerator Zhao, Zhongyuan; Kwon, Hyoukjun; Kuhar, Sachit 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) https://doi.org/10.1109/ISPASS.2019.00040	conference	March 2019
MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects Kwon, Hyoukjun; Samajdar, Ananda; Krishna, Tushar ACM SIGPLAN Notices, Vol. 53, Issue 2 https://doi.org/10.1145/3296957.3173176	journal	November 2018
In-Datacenter Performance Analysis of a Tensor Processing Unit Jouppi, Norman P.; Borchers, Al; Boyle, Rick Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17 https://doi.org/10.1145/3079856.3080246	conference	January 2017
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach Kwon, Hyoukjun; Chatarasi, Prasanth; Pellauer, Michael MICRO '52: The 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture https://doi.org/10.1145/3352460.3358252	conference	October 2019
Deep Residual Learning for Image Recognition He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.90	conference	June 2016
Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures Pedram, Ardavan; van de Geijn, Robert A.; Gerstlauer, Andreas IEEE Transactions on Computers, Vol. 61, Issue 12 https://doi.org/10.1109/TC.2012.132	journal	December 2012
Anatomy of high-performance matrix multiplication Goto, Kazushige; Geijn, Robert A. van de ACM Transactions on Mathematical Software, Vol. 34, Issue 3 https://doi.org/10.1145/1356052.1356053	journal	May 2008
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Chen, Yu-Hsin; Krishna, Tushar; Emer, Joel S. IEEE Journal of Solid-State Circuits, Vol. 52, Issue 1 https://doi.org/10.1109/JSSC.2016.2616357	journal	January 2017

Similar Records

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

Technical Report · Wed Jul 14 00:00:00 EDT 2021 · OSTI ID:1820407

Moon, Gordon E.; Kwon, Hyoukjun; Jeong, Geonhwa; +3 more

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operationson Spatial Accelerators

Conference · Mon Oct 18 00:00:00 EDT 2021 · OSTI ID:1820407

Jeong, Geonhwa; Kestor, Gokcen; Chatarasi, Prasanth; +5 more

FPGA Acceleration of GCN in Light of the Symmetry of Graph Adjacency Matrix

Conference · Fri Jun 02 00:00:00 EDT 2023 · OSTI ID:1820407

Nair, Gopikrishnan R.; Suh, Han-Sok; Halappanavar, Mahantesh; +3 more

Related Subjects

42 ENGINEERING
spatial accelerator
DNN accelerator
dataflow
GEMM mapping

Title: Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication.

Citation Formats

References (27)

Similar Records

Related Subjects