Roofline: an insightful visual performance model for multicore architectures
Abstract
We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.
- Authors:
-
- Univ. of California, Berkeley, CA (United States). Parallel Computing Lab.
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- OSTI Identifier:
- 1407073
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Communications of the ACM
- Additional Journal Information:
- Journal Volume: 52; Journal Issue: 4; Journal ID: ISSN 0001-0782
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Williams, Samuel, Waterman, Andrew, and Patterson, David. Roofline: an insightful visual performance model for multicore architectures. United States: N. p., 2009.
Web. doi:10.1145/1498765.1498785.
Williams, Samuel, Waterman, Andrew, & Patterson, David. Roofline: an insightful visual performance model for multicore architectures. United States. https://doi.org/10.1145/1498765.1498785
Williams, Samuel, Waterman, Andrew, and Patterson, David. Sat .
"Roofline: an insightful visual performance model for multicore architectures". United States. https://doi.org/10.1145/1498765.1498785. https://www.osti.gov/servlets/purl/1407073.
@article{osti_1407073,
title = {Roofline: an insightful visual performance model for multicore architectures},
author = {Williams, Samuel and Waterman, Andrew and Patterson, David},
abstractNote = {We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.},
doi = {10.1145/1498765.1498785},
journal = {Communications of the ACM},
number = 4,
volume = 52,
place = {United States},
year = {Sat Apr 04 00:00:00 EDT 2009},
month = {Sat Apr 04 00:00:00 EDT 2009}
}
Free Publicly Available Full Text
Publisher's Version of Record
Other availability
Cited by: 1138 works
Citation information provided by
Web of Science
Web of Science
Save to My Library
You must Sign In or Create an Account in order to save documents to your library.
Works referenced in this record:
Validity of the single processor approach to achieving large scale computing capabilities
conference, January 1967
- Amdahl, Gene M.
- Proceedings of the April 18-20, 1967, spring joint computer conference on - AFIPS '67 (Spring)
A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1
conference, January 1994
- Boyd, E. L.; Azeem, W.; Hsien-Hsin Lee, Hsien-Hsin Lee
- 1994 International Conference on Parallel Processing Vol. 3
Estimating interlock and improving balance for pipelined architectures
journal, August 1988
- Callahan, David; Cocke, John; Kennedy, Ken
- Journal of Parallel and Distributed Computing, Vol. 5, Issue 4
Improving the ratio of memory operations to floating-point operations in loops
journal, November 1994
- Carr, Steve; Kennedy, Ken
- ACM Transactions on Programming Languages and Systems, Vol. 16, Issue 6
Self-Adapting Linear Algebra Algorithms and Software
journal, February 2005
- Demmel, J.; Dongarra, J.; Eijkhout, V.
- Proceedings of the IEEE, Vol. 93, Issue 2
Performance of Synchronized Iterative Processes in Multiprocessor Systems
journal, July 1982
- Dubois, M.; Briggs, F. A.
- IEEE Transactions on Software Engineering, Vol. SE-8, Issue 4
The Design and Implementation of FFTW3
journal, February 2005
- Frigo, M.; Johnson, S. G.
- Proceedings of the IEEE, Vol. 93, Issue 2
Mapping computational concepts to GPUs
conference, January 2005
- Harris, Mark
- ACM SIGGRAPH 2005 Courses on - SIGGRAPH '05
Amdahl's Law in the Multicore Era
journal, July 2008
- Hill, Mark D.; Marty, Michael R.
- Computer, Vol. 41, Issue 7
Evaluating associativity in CPU caches
journal, January 1989
- Hill, M. D.; Smith, A. J.
- IEEE Transactions on Computers, Vol. 38, Issue 12
A Proof for the Queuing Formula: L = λ W
journal, June 1961
- Little, John D. C.
- Operations Research, Vol. 9, Issue 3
Latency lags bandwith
journal, October 2004
- Patterson, David A.
- Communications of the ACM, Vol. 47, Issue 10
Analytic Queueing Network Models for Parallel Processing of Task Systems
journal, December 1986
- Thomasian, A.
- IEEE Transactions on Computers, Vol. C-35, Issue 12, p. 1045-1054
A genetic algorithms approach to modeling the performance of memory-bound computations
conference, January 2007
- Tikir, Mustafa M.; Carrington, Laura; Strohmaier, Erich
- Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
Lattice Boltzmann simulation optimization on leading multicore platforms
conference, April 2008
- Williams, Samuel; Carter, Jonathan; Oliker, Leonid
- Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
conference, January 2007
- Williams, Samuel; Oliker, Leonid; Vuduc, Richard
- Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
The SPLASH-2 programs: characterization and methodological considerations
conference, January 1995
- Woo, Steven Cameron; Ohara, Moriyoshi; Torrie, Evan
- Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95
Works referencing / citing this record:
Evaluating automatically parallelized versions of the support vector machine: EVALUATING AUTOMATICALLY PARALLELIZED VERSIONS OF THE SVM
journal, October 2014
- Codreanu, Valeriu; Dröge, Bob; Williams, David
- Concurrency and Computation: Practice and Experience, Vol. 28, Issue 7
Towards generating efficient flow solvers with the ExaStencils approach: Towards generating efficient flow solvers with the ExaStencils approach
journal, May 2017
- Kuckuk, Sebastian; Haase, Gundolf; Vasco, Diego A.
- Concurrency and Computation: Practice and Experience, Vol. 29, Issue 17
Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications: Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications
journal, March 2017
- Calore, Enrico; Gabbana, Alessandro; Schifano, Sebastiano Fabio
- Concurrency and Computation: Practice and Experience, Vol. 29, Issue 12
An efficient low-rank Kalman filter for modern SIMD architectures: An Efficient Low-Rank Kalman Filter for Modern SIMD Architectures
journal, April 2018
- Cámpora Pérez, Daniel Hugo; Awile, Omar
- Concurrency and Computation: Practice and Experience, Vol. 30, Issue 23
AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL: AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL
journal, July 2018
- Coronado-Barrientos, E.; Indalecio, G.; García-Loureiro, A.
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 1
Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
journal, August 2018
- Carrijo Nasciutti, Thiago; Panetta, Jairo; Pais Lopes, Pedro
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 18
Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU: Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU
journal, September 2018
- Yamashita, Kohei; Ito, Yasuaki; Nakano, Koji
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 19
Design of self‐adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution
journal, August 2018
- Reddy Manumachu, Ravi; Lastovetsky, Alexey L.
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 4
Roofline analysis with Cray performance analysis tools (CrayPat) and roofline‐based performance projections for a future architecture
journal, September 2018
- Kwack, JaeHyuk; Arnold, Galen; Mendes, Celso
- Concurrency and Computation: Practice and Experience
High‐performance SIMD implementation of the lattice‐Boltzmann method on the Xeon Phi processor
journal, November 2018
- Robertsén, Fredrik; Mattila, Keijo; Westerholm, Jan
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 13
Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system
journal, November 2019
- Yang, Charlene; Kurth, Thorsten; Williams, Samuel
- Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20
Use of model-based architecture attributes to construct a component-level trade space
journal, February 2019
- McKean, David; Moreland, James D.; Doskey, Steven
- Systems Engineering, Vol. 22, Issue 2
LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation
book, December 2018
- Perepelkina, Anastasia; Levchenko, Vadim
- Communications in Computer and Information Science
Modeling and Optimizing Data Transfer in GPU-Accelerated Optical Coherence Tomography
book, December 2018
- Schrödter, Tobias; Pallasch, David; Wienke, Sandra
- Lecture Notes in Computer Science
DSL-Based Acceleration of Automotive Environment Perception and Mapping Algorithms for Embedded CPUs, GPUs, and FPGAs
book, January 2019
- Fickenscher, Jörg; Hannig, Frank; Teich, Jürgen
- Architecture of Computing Systems – ARCS 2019
GPU Implementation of ConeTorre Algorithm for Fluid Dynamics Simulation
book, July 2019
- Levchenko, Vadim; Zakirov, Andrey; Perepelkina, Anastasia
- Parallel Computing Technologies: 15th International Conference, PaCT 2019, Almaty, Kazakhstan, August 19–23, 2019, Proceedings, p. 199-213
LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU
book, August 2019
- Levchenko, Vadim; Zakirov, Andrey; Perepelkina, Anastasia
- Parallel Computational Technologies: 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2–4, 2019, Revised Selected Papers, p. 139-151
Optimizing Wilson-Dirac Operator and Linear Solvers for Intel® KNL
book, October 2016
- Joó, Bálint; Kalamkar, Dhiraj D.; Kurth, Thorsten
- Lecture Notes in Computer Science
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
book, May 2017
- Hammer, Julian; Eitzinger, Jan; Hager, Georg
- Tools for High Performance Computing 2016
A High-Throughput Kalman Filter for Modern SIMD Architectures
book, January 2018
- Cámpora Pérez, Daniel Hugo; Awile, Omar; Potterat, Cédric
- Euro-Par 2017: Parallel Processing Workshops
Approximate FPGA-Based LSTMs Under Computation Time Constraints
book, January 2018
- Rizakis, Michalis; Venieris, Stylianos I.; Kouris, Alexandros
- Applied Reconfigurable Computing. Architectures, Tools, and Applications
On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors
book, January 2018
- Hofmann, Johannes; Hager, Georg; Fey, Dietmar
- Lecture Notes in Computer Science
Software Design Space Exploration for Exascale Combustion Co-design
book, January 2013
- Chan, Cy; Unat, Didem; Lijewski, Michael
- Lecture Notes in Computer Science
How Many Threads will be too Many? On the Scalability of OpenMP Implementations
book, January 2015
- Iwainsky, Christian; Shudler, Sergei; Calotoiu, Alexandru
- Lecture Notes in Computer Science
Measuring energy consumption using EML (energy measurement library)
journal, July 2014
- Cabrera, Alberto; Almeida, Francisco; Arteaga, Javier
- Computer Science - Research and Development, Vol. 30, Issue 2
Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures
journal, November 2016
- Ciznicki, Milosz; Kurowski, Krzysztof; Weglarz, Jan
- Cluster Computing, Vol. 20, Issue 3
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems
journal, October 2016
- Kreutzer, Moritz; Thies, Jonas; Röhrig-Zöllner, Melven
- International Journal of Parallel Programming, Vol. 45, Issue 5
Type-Driven Automated Program Transformations and Cost Modelling for Optimising Streaming Programs on FPGAs
journal, April 2018
- Vanderbauwhede, Wim; Nabi, Syed Waqar; Urlea, Cristian
- International Journal of Parallel Programming, Vol. 47, Issue 1
3DyRM: a dynamic roofline model including memory latency information
journal, March 2014
- Lorenzo, O. G.; Pena, T. F.; Cabaleiro, J. C.
- The Journal of Supercomputing, Vol. 70, Issue 2
Optimization of parallel iterated local search algorithms on graphics processing unit
journal, May 2016
- Zhou, Yi; He, Fazhi; Qiu, Yimin
- The Journal of Supercomputing, Vol. 72, Issue 6
The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemes
journal, June 2018
- Perepelkina, Anastasia; Levchenko, Vadim; Khilkov, Sergey
- The Journal of Supercomputing, Vol. 75, Issue 12
Efficient scheduling of streams on GPGPUs
journal, February 2020
- Beheshti Roui, Mohamad; Shekofteh, S. Kazem; Noori, Hamid
- The Journal of Supercomputing, Vol. 76, Issue 11
Development of a Parallel Explicit Finite-Volume Euler Equation Solver using the Immersed Boundary Method with Hybrid MPI-CUDA Paradigm
journal, October 2019
- Kuo, F. A.; Chiang, C. H.; Lo, M. C.
- Journal of Mechanics, Vol. 36, Issue 1
High performance FDTD algorithm for GPGPU supercomputers
journal, October 2016
- Zakirov, Andrey; Levchenko, Vadim; Perepelkina, Anastasia
- Journal of Physics: Conference Series, Vol. 759
Ultrafast analysis of individual grain behavior during grain growth by parallel computing
journal, August 2015
- Kühbach, M.; Barrales-Mora, L. A.; Mießen, C.
- IOP Conference Series: Materials Science and Engineering, Vol. 89
A real-time, all-sky, high time resolution, direct imager for the long wavelength array
journal, May 2019
- Kent, James; Dowell, Jayce; Beardsley, Adam
- Monthly Notices of the Royal Astronomical Society, Vol. 486, Issue 4
Direct wide-field radio imaging in real-time at high time resolution using antenna electric fields
journal, October 2019
- Kent, James; Beardsley, Adam P.; Bester, Landman
- Monthly Notices of the Royal Astronomical Society, Vol. 491, Issue 1
Locally Recursive Non-Locally Asynchronous Algorithms for Stencil Computation
journal, May 2018
- Levchenko, V. D.; Perepelkina, A. Y.
- Lobachevskii Journal of Mathematics, Vol. 39, Issue 4
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
conference, January 2015
- Zhang, Chen; Li, Peng; Sun, Guangyu
- Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '15
Optimizing Sparse Matrix—Matrix Multiplication for the GPU
journal, October 2015
- Dalton, Steven; Olson, Luke; Bell, Nathan
- ACM Transactions on Mathematical Software, Vol. 41, Issue 4
Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications
conference, January 2015
- Wahib, Mohamed; Maruyama, Naoya
- Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15
Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model
conference, January 2015
- Stengel, Holger; Treibig, Jan; Hager, Georg
- Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results
conference, January 2015
- Hoefler, Torsten; Belli, Roberto
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
Harnessing energy efficiency of heterogeneous-ISA platforms
conference, January 2015
- Bhat, Sharath K.; Saya, Ajithchandra; Rawat, Hemedra K.
- Proceedings of the Workshop on Power-Aware Computing and Systems - HotPower '15
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
conference, January 2015
- Ardalani, Newsha; Lestourgeon, Clint; Sankaralingam, Karthikeyan
- Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48
Variation Among Processors Under Turbo Boost in HPC Systems
conference, January 2016
- Acun, Bilge; Miller, Phil; Kale, Laxmikant V.
- Proceedings of the 2016 International Conference on Supercomputing - ICS '16
Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells
journal, January 2017
- Meister, Oliver; Rahnema, Kaveh; Bader, Michael
- ACM Transactions on Mathematical Software, Vol. 43, Issue 3
Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks
conference, November 2016
- Zhang, Chen; Fang, Zhenman; Zhou, Peipei
- ICCAD '16: IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, Proceedings of the 35th International Conference on Computer-Aided Design
Resource Conscious Reuse-Driven Tiling for GPUs
conference, January 2016
- Rawat, Prashant Singh; Hong, Changwan; Ravishankar, Mahesh
- Proceedings of the 2016 International Conference on Parallel Architectures and Compilation - PACT '16
Data-Centric Computing Frontiers: A Survey On Processing-In-Memory
conference, October 2016
- Siegl, Patrick; Buchty, Rainer; Berekovic, Mladen
- MEMSYS '16: The Second International Symposium on Memory Systems, Proceedings of the Second International Symposium on Memory Systems
Sparse Matrix-Vector Multiplication on GPGPUs
journal, January 2017
- Filippone, Salvatore; Cardellini, Valeria; Barbieri, Davide
- ACM Transactions on Mathematical Software, Vol. 43, Issue 4
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
conference, January 2017
- Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio
- Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs
conference, June 2017
- Xiao, Qingcheng; Liang, Yun; Lu, Liqiang
- DAC '17: The 54th Annual Design Automation Conference 2017, Proceedings of the 54th Annual Design Automation Conference 2017
A Survey of Power and Energy Predictive Models in HPC Systems and Applications
journal, October 2017
- O’brien, Kenneth; Pietri, Ilia; Reddy, Ravi
- ACM Computing Surveys, Vol. 50, Issue 3
In-Datacenter Performance Analysis of a Tensor Processing Unit
conference, January 2017
- Jouppi, Norman P.; Borchers, Al; Boyle, Rick
- Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17
In-Datacenter Performance Analysis of a Tensor Processing Unit
journal, June 2017
- Jouppi, Norman P.; Borchers, Al; Boyle, Rick
- ACM SIGARCH Computer Architecture News, Vol. 45, Issue 2
Design of a High-Performance GEMM-like Tensor–Tensor Multiplication
journal, April 2018
- Springer, Paul; Bientinesi, Paolo
- ACM Transactions on Mathematical Software, Vol. 44, Issue 3
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
journal, July 2018
- Venieris, Stylianos I.; Kouris, Alexandros; Bouganis, Christos-Savvas
- ACM Computing Surveys, Vol. 51, Issue 3
A Survey on Compiler Autotuning using Machine Learning
journal, January 2019
- Ashouri, Amir H.; Killian, William; Cavazos, John
- ACM Computing Surveys, Vol. 51, Issue 5
Efficient sparse-matrix multi-vector product on GPUs
conference, January 2018
- Hong, Changwan; Sadayappan, P.; Sukumaran-Rajam, Aravind
- Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '18
FINN- R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
journal, December 2018
- Blott, Michaela; Preußer, Thomas B.; Fraser, Nicholas J.
- ACM Transactions on Reconfigurable Technology and Systems, Vol. 11, Issue 3
In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms
journal, April 2019
- Choi, Young-Kyu; Cong, Jason; Fang, Zhenman
- ACM Transactions on Reconfigurable Technology and Systems, Vol. 12, Issue 1
Metric Selection for GPU Kernel Classification
journal, January 2019
- Shekofteh, S. -Kazem; Noori, Hamid; Naghibzadeh, Mahmoud
- ACM Transactions on Architecture and Code Optimization, Vol. 15, Issue 4
Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators
journal, August 2019
- Kronbichler, Martin; Kormann, Katharina
- ACM Transactions on Mathematical Software, Vol. 45, Issue 3
On the Correct Measurement of Application Memory Bandwidth and Memory Access Latency
conference, January 2020
- Helm, Christian; Taura, Kenjiro
- HPCAsia2020: International Conference on High Performance Computing in Asia-Pacific Region, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC
journal, March 2019
- Lagravière, Jérémie; Langguth, Johannes; Prugger, Martina
- Scientific Programming, Vol. 2019
ExaSAT: An exascale co-design tool for performance modeling
journal, April 2014
- Unat, Didem; Chan, Cy; Zhang, Weiqun
- The International Journal of High Performance Computing Applications, Vol. 29, Issue 2
Modeling high-throughput applications for in situ analytics
journal, May 2019
- Aupy, Guillaume; Goglin, Brice; Honoré, Valentin
- The International Journal of High Performance Computing Applications, Vol. 33, Issue 6
Analytic performance modeling and analysis of detailed neuron simulations
journal, April 2020
- Cremonesi, Francesco; Hager, Georg; Wellein, Gerhard
- The International Journal of High Performance Computing Applications, Vol. 34, Issue 4
Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
journal, November 2012
- Kim, Hyesoon; Vuduc, Richard; Baghsorkhi, Sara
- Synthesis Lectures on Computer Architecture, Vol. 7, Issue 2
Data Management in Machine Learning Systems
journal, February 2019
- Boehm, Matthias; Kumar, Arun; Yang, Jun
- Synthesis Lectures on Data Management, Vol. 14, Issue 1
Lagrange-Flux Schemes: Reformulating Second-Order Accurate Lagrange-Remap Schemes for Better Node-Based HPC Performance
journal, November 2016
- De Vuyst, Florian; Gasc, Thibault; Motte, Renaud
- Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles, Vol. 71, Issue 6
Compression Challenges in Large Scale Partial Differential Equation Solvers
journal, September 2019
- Götschel, Sebastian; Weiser, Martin
- Algorithms, Vol. 12, Issue 9
DiamondTorre Algorithm for High-Performance Wave Modeling
journal, August 2016
- Levchenko, Vadim; Perepelkina, Anastasia; Zakirov, Andrey
- Computation, Vol. 4, Issue 3
An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
journal, March 2019
- Liu, Bing; Zou, Danyin; Feng, Lei
- Electronics, Vol. 8, Issue 3
Developing Efficient Discrete Simulations on Multicore and GPU Architectures
journal, January 2020
- Cagigas-Muñiz, Daniel; Diaz-del-Rio, Fernando; López-Torres, Manuel Ramón
- Electronics, Vol. 9, Issue 1
Fog vs. Cloud Computing: Should I Stay or Should I Go?
journal, February 2019
- Pisani, Flávia; Martins do Rosario, Vanderson; Borin, Edson
- Future Internet, Vol. 11, Issue 2
A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture
journal, December 2018
- Wan, Bo; Yang, Lin; Zhou, Shunping
- ISPRS International Journal of Geo-Information, Vol. 7, Issue 12
CPMIP: measurements of real computational performance of Earth system models in CMIP6
journal, January 2017
- Balaji, Venkatramani; Maisonnave, Eric; Zadeh, Niki
- Geoscientific Model Development, Vol. 10, Issue 1
Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0
journal, January 2018
- Fuhrer, Oliver; Chadha, Tarun; Hoefler, Torsten
- Geoscientific Model Development, Vol. 11, Issue 4
Portable multi- and many-core performance for finite-difference or finite-element codes – application to the free-surface component of NEMO (NEMOLite2D 1.0)
journal, January 2018
- Porter, Andrew R.; Appleyard, Jeremy; Ashworth, Mike
- Geoscientific Model Development, Vol. 11, Issue 8
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
posted_content, January 2018
- Louboutin, Mathias; Lange, Michael; Luporini, Fabio
- Geoscientific Model Development Discussions
Vicuna: A Timing-Predictable RISC-V Vector Coprocessor for Scalable Parallel Computation
text, January 2021
- Platzer, Michael; Puschner, Peter
- Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Co-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: a First Look at Knights Landing
text, January 2016
- Bastrakov, Sergey; Meyerov, Iosif; Gonoskov, Arkady
- Unpublished
Direct wide-field radio imaging in real-time at high time resolution using antenna electric fields
text, January 2020
- Kent, James; Beardsley, Ap; Bester, L.
- Apollo - University of Cambridge Repository
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
journal, January 2019
- Louboutin, Mathias; Lange, Michael; Luporini, Fabio
- Geoscientific Model Development, Vol. 12, Issue 3
Harnessing Energy Efficiency of Heterogeneous-ISA Platforms
journal, January 2016
- Bhat, Sharath K.; Saya, Ajithchandra; Rawat, Hemedra K.
- ACM SIGOPS Operating Systems Review, Vol. 49, Issue 2
Ultrafast analysis of individual grain behavior during grain growth by parallel computing
text, January 2015
- Kühbach, M.; Barrales-Mora, L. A.; Mießen, C.
- RWTH Aachen University
Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model
text, January 2014
- Stengel, Holger; Treibig, Jan; Hager, Georg
- arXiv
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
text, January 2015
- Kreutzer, Moritz; Thies, Jonas; Röhrig-Zöllner, Melven
- arXiv
Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing
preprint, January 2016
- Surmin, Igor; Bastrakov, Sergey; Matveev, Zakhar
- arXiv
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
text, January 2016
- Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio
- arXiv
A Survey on Compiler Autotuning using Machine Learning
text, January 2018
- Ashouri, Amir H.; Killian, William; Cavazos, John
- arXiv
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
text, January 2018
- Louboutin, Mathias; Lange, Michael; Luporini, Fabio
- arXiv
A Real-Time, All-Sky, High Time Resolution, Direct Imager for the Long Wavelength Array
text, January 2019
- Kent, James; Dowell, Jayce; Beardsley, Adam
- arXiv
Performance optimization and modeling of fine-grained irregular communication in UPC
text, January 2019
- Lagravière, Jérémie; Langguth, Johannes; Prugger, Martina
- arXiv
In situ and in-transit analysis of cosmological simulations
journal, August 2016
- Friesen, Brian; Almgren, Ann; Lukić, Zarija
- Computational Astrophysics and Cosmology, Vol. 3, Issue 1
Characterizing Task-Based OpenMP Programs
journal, April 2015
- Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats
- PLOS ONE, Vol. 10, Issue 4