A coordinated tiling and batching framework for efficient GEMM on GPUs
- Li, Xiuhong; Liang, Yun; Yan, Shengen
-
PPoPP '19: 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming
https://doi.org/10.1145/3293883.3295734
|
conference
|
February 2019 |
Self-Adapting Linear Algebra Algorithms and Software
|
journal
|
February 2005 |
dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators
|
journal
|
October 2019 |
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
- Yang, Xuan; Gao, Mingyu; Liu, Qiaoyi
-
ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
https://doi.org/10.1145/3373376.3378514
|
conference
|
March 2020 |
A Hardware–Software Blueprint for Flexible Deep Learning Specialization
|
journal
|
September 2019 |
Domain-specific library generation for parallel software and hardware platforms
- Franchetti, Franz; Voronenko, Yevgen; Milder, Peter A.
-
2008 IEEE International Parallel & Distributed Processing Symposium, 2008 IEEE International Symposium on Parallel and Distributed Processing
https://doi.org/10.1109/IPDPS.2008.4536398
|
conference
|
April 2008 |
Automated empirical optimizations of software and the ATLAS project
|
journal
|
January 2001 |
A high-performance, low-power linear algebra core
- Pedram, Ardavan; Gerstlauer, Andreas; Geijn, Robert A. van de
-
2011 IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors
https://doi.org/10.1109/ASAP.2011.6043234
|
conference
|
September 2011 |
Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture
|
journal
|
March 2021 |
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
|
journal
|
September 1998 |
High-performance implementation of the level-3 BLAS
|
journal
|
July 2008 |
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
|
conference
|
February 2020 |
A survey of direct methods for sparse linear systems
|
journal
|
May 2016 |
Rethinking NoCs for Spatial Neural Network Accelerators
- Kwon, Hyoukjun; Samajdar, Ananda; Krishna, Tushar
-
NOCS '17: International Symposium on Networks-on-Chip, Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip
https://doi.org/10.1145/3130218.3130230
|
conference
|
October 2017 |
Understanding the Impact of On-chip Communication on DNN Accelerator Performance
|
conference
|
November 2019 |
A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference
|
conference
|
June 2018 |
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
|
conference
|
March 2019 |
ShiDianNao: shifting vision processing closer to the sensor
- Du, Zidong; Fasthuber, Robert; Chen, Tianshi
-
ISCA '15: The 42nd Annual International Symposium on Computer Architecture, Proceedings of the 42nd Annual International Symposium on Computer Architecture
https://doi.org/10.1145/2749469.2750389
|
conference
|
June 2015 |
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
|
journal
|
October 2016 |
mRNA: Enabling Efficient Mapping Space Exploration for a Reconfiguration Neural Accelerator
|
conference
|
March 2019 |
MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects
|
journal
|
November 2018 |
In-Datacenter Performance Analysis of a Tensor Processing Unit
|
conference
|
January 2017 |
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach
- Kwon, Hyoukjun; Chatarasi, Prasanth; Pellauer, Michael
-
MICRO '52: The 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
https://doi.org/10.1145/3352460.3358252
|
conference
|
October 2019 |
Deep Residual Learning for Image Recognition
|
conference
|
June 2016 |
Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures
|
journal
|
December 2012 |
Anatomy of high-performance matrix multiplication
|
journal
|
May 2008 |
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
|
journal
|
January 2017 |