Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
Abstract
We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.
- Authors:
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- Computational Research Division
- OSTI Identifier:
- 963540
- Report Number(s):
- LBNL-2141E
TRN: US200918%%382
- DOE Contract Number:
- DE-AC02-05CH11231
- Resource Type:
- Journal Article
- Journal Name:
- Communications of the Association for Computing Machinery
- Additional Journal Information:
- Journal Name: Communications of the Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; ARCHITECTS; PERFORMANCE; PARALLEL PROCESSING
Citation Formats
Williams, Samuel, Waterman, Andrew, and Patterson, David. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures. United States: N. p., 2009.
Web. doi:10.1145/1498765.1498785.
Williams, Samuel, Waterman, Andrew, & Patterson, David. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures. United States. https://doi.org/10.1145/1498765.1498785
Williams, Samuel, Waterman, Andrew, and Patterson, David. 2009.
"Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures". United States. https://doi.org/10.1145/1498765.1498785. https://www.osti.gov/servlets/purl/963540.
@article{osti_963540,
title = {Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures},
author = {Williams, Samuel and Waterman, Andrew and Patterson, David},
abstractNote = {We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.},
doi = {10.1145/1498765.1498785},
url = {https://www.osti.gov/biblio/963540},
journal = {Communications of the Association for Computing Machinery},
number = ,
volume = ,
place = {United States},
year = {Sun Feb 01 00:00:00 EST 2009},
month = {Sun Feb 01 00:00:00 EST 2009}
}
Other availability
Save to My Library
You must Sign In or Create an Account in order to save documents to your library.
Works referenced in this record:
Validity of the single processor approach to achieving large scale computing capabilities
conference, January 1967
- Amdahl, Gene M.
- Proceedings of the April 18-20, 1967, spring joint computer conference on - AFIPS '67 (Spring)
A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1
conference, January 1994
- Boyd, E. L.; Azeem, W.; Hsien-Hsin Lee, Hsien-Hsin Lee
- 1994 International Conference on Parallel Processing Vol. 3
Estimating interlock and improving balance for pipelined architectures
journal, August 1988
- Callahan, David; Cocke, John; Kennedy, Ken
- Journal of Parallel and Distributed Computing, Vol. 5, Issue 4
Improving the ratio of memory operations to floating-point operations in loops
journal, November 1994
- Carr, Steve; Kennedy, Ken
- ACM Transactions on Programming Languages and Systems, Vol. 16, Issue 6
Self-Adapting Linear Algebra Algorithms and Software
journal, February 2005
- Demmel, J.; Dongarra, J.; Eijkhout, V.
- Proceedings of the IEEE, Vol. 93, Issue 2
Performance of Synchronized Iterative Processes in Multiprocessor Systems
journal, July 1982
- Dubois, M.; Briggs, F. A.
- IEEE Transactions on Software Engineering, Vol. SE-8, Issue 4
The Design and Implementation of FFTW3
journal, February 2005
- Frigo, M.; Johnson, S. G.
- Proceedings of the IEEE, Vol. 93, Issue 2
Mapping computational concepts to GPUs
conference, January 2005
- Harris, Mark
- ACM SIGGRAPH 2005 Courses on - SIGGRAPH '05
Evaluating associativity in CPU caches
journal, January 1989
- Hill, M. D.; Smith, A. J.
- IEEE Transactions on Computers, Vol. 38, Issue 12
A Proof for the Queuing Formula: L = λ W
journal, June 1961
- Little, John D. C.
- Operations Research, Vol. 9, Issue 3
Latency lags bandwith
journal, October 2004
- Patterson, David A.
- Communications of the ACM, Vol. 47, Issue 10
Analytic Queueing Network Models for Parallel Processing of Task Systems
journal, December 1986
- Thomasian, A.
- IEEE Transactions on Computers, Vol. C-35, Issue 12, p. 1045-1054
A genetic algorithms approach to modeling the performance of memory-bound computations
conference, January 2007
- Tikir, Mustafa M.; Carrington, Laura; Strohmaier, Erich
- Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
Lattice Boltzmann simulation optimization on leading multicore platforms
conference, April 2008
- Williams, Samuel; Carter, Jonathan; Oliker, Leonid
- Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
conference, January 2007
- Williams, Samuel; Oliker, Leonid; Vuduc, Richard
- Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
The SPLASH-2 programs: characterization and methodological considerations
conference, January 1995
- Woo, Steven Cameron; Ohara, Moriyoshi; Torrie, Evan
- Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95
Works referencing / citing this record:
Evaluating automatically parallelized versions of the support vector machine: EVALUATING AUTOMATICALLY PARALLELIZED VERSIONS OF THE SVM
journal, October 2014
- Codreanu, Valeriu; Dröge, Bob; Williams, David
- Concurrency and Computation: Practice and Experience, Vol. 28, Issue 7
Towards generating efficient flow solvers with the ExaStencils approach: Towards generating efficient flow solvers with the ExaStencils approach
journal, May 2017
- Kuckuk, Sebastian; Haase, Gundolf; Vasco, Diego A.
- Concurrency and Computation: Practice and Experience, Vol. 29, Issue 17
Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications: Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications
journal, March 2017
- Calore, Enrico; Gabbana, Alessandro; Schifano, Sebastiano Fabio
- Concurrency and Computation: Practice and Experience, Vol. 29, Issue 12
An efficient low-rank Kalman filter for modern SIMD architectures: An Efficient Low-Rank Kalman Filter for Modern SIMD Architectures
journal, April 2018
- Cámpora Pérez, Daniel Hugo; Awile, Omar
- Concurrency and Computation: Practice and Experience, Vol. 30, Issue 23
AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL: AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL
journal, July 2018
- Coronado-Barrientos, E.; Indalecio, G.; García-Loureiro, A.
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 1
Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
journal, August 2018
- Carrijo Nasciutti, Thiago; Panetta, Jairo; Pais Lopes, Pedro
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 18
Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU: Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU
journal, September 2018
- Yamashita, Kohei; Ito, Yasuaki; Nakano, Koji
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 19
Design of self‐adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution
journal, August 2018
- Reddy Manumachu, Ravi; Lastovetsky, Alexey L.
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 4
Roofline analysis with Cray performance analysis tools (CrayPat) and roofline‐based performance projections for a future architecture
journal, September 2018
- Kwack, JaeHyuk; Arnold, Galen; Mendes, Celso
- Concurrency and Computation: Practice and Experience
High‐performance SIMD implementation of the lattice‐Boltzmann method on the Xeon Phi processor
journal, November 2018
- Robertsén, Fredrik; Mattila, Keijo; Westerholm, Jan
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 13
Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system
journal, November 2019
- Yang, Charlene; Kurth, Thorsten; Williams, Samuel
- Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20
Use of model-based architecture attributes to construct a component-level trade space
journal, February 2019
- McKean, David; Moreland, James D.; Doskey, Steven
- Systems Engineering, Vol. 22, Issue 2
LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation
book, December 2018
- Perepelkina, Anastasia; Levchenko, Vadim
- Communications in Computer and Information Science
Modeling and Optimizing Data Transfer in GPU-Accelerated Optical Coherence Tomography
book, December 2018
- Schrödter, Tobias; Pallasch, David; Wienke, Sandra
- Lecture Notes in Computer Science
DSL-Based Acceleration of Automotive Environment Perception and Mapping Algorithms for Embedded CPUs, GPUs, and FPGAs
book, January 2019
- Fickenscher, Jörg; Hannig, Frank; Teich, Jürgen
- Architecture of Computing Systems – ARCS 2019
GPU Implementation of ConeTorre Algorithm for Fluid Dynamics Simulation
book, July 2019
- Levchenko, Vadim; Zakirov, Andrey; Perepelkina, Anastasia
- Parallel Computing Technologies: 15th International Conference, PaCT 2019, Almaty, Kazakhstan, August 19–23, 2019, Proceedings, p. 199-213
LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU
book, August 2019
- Levchenko, Vadim; Zakirov, Andrey; Perepelkina, Anastasia
- Parallel Computational Technologies: 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2–4, 2019, Revised Selected Papers, p. 139-151
Optimizing Wilson-Dirac Operator and Linear Solvers for Intel® KNL
book, October 2016
- Joó, Bálint; Kalamkar, Dhiraj D.; Kurth, Thorsten
- Lecture Notes in Computer Science
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
book, May 2017
- Hammer, Julian; Eitzinger, Jan; Hager, Georg
- Tools for High Performance Computing 2016
A High-Throughput Kalman Filter for Modern SIMD Architectures
book, January 2018
- Cámpora Pérez, Daniel Hugo; Awile, Omar; Potterat, Cédric
- Euro-Par 2017: Parallel Processing Workshops
Approximate FPGA-Based LSTMs Under Computation Time Constraints
book, January 2018
- Rizakis, Michalis; Venieris, Stylianos I.; Kouris, Alexandros
- Applied Reconfigurable Computing. Architectures, Tools, and Applications
On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors
book, January 2018
- Hofmann, Johannes; Hager, Georg; Fey, Dietmar
- Lecture Notes in Computer Science
Software Design Space Exploration for Exascale Combustion Co-design
book, January 2013
- Chan, Cy; Unat, Didem; Lijewski, Michael
- Lecture Notes in Computer Science
How Many Threads will be too Many? On the Scalability of OpenMP Implementations
book, January 2015
- Iwainsky, Christian; Shudler, Sergei; Calotoiu, Alexandru
- Lecture Notes in Computer Science
Measuring energy consumption using EML (energy measurement library)
journal, July 2014
- Cabrera, Alberto; Almeida, Francisco; Arteaga, Javier
- Computer Science - Research and Development, Vol. 30, Issue 2
Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures
journal, November 2016
- Ciznicki, Milosz; Kurowski, Krzysztof; Weglarz, Jan
- Cluster Computing, Vol. 20, Issue 3
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems
journal, October 2016
- Kreutzer, Moritz; Thies, Jonas; Röhrig-Zöllner, Melven
- International Journal of Parallel Programming, Vol. 45, Issue 5
Type-Driven Automated Program Transformations and Cost Modelling for Optimising Streaming Programs on FPGAs
journal, April 2018
- Vanderbauwhede, Wim; Nabi, Syed Waqar; Urlea, Cristian
- International Journal of Parallel Programming, Vol. 47, Issue 1
3DyRM: a dynamic roofline model including memory latency information
journal, March 2014
- Lorenzo, O. G.; Pena, T. F.; Cabaleiro, J. C.
- The Journal of Supercomputing, Vol. 70, Issue 2
Optimization of parallel iterated local search algorithms on graphics processing unit
journal, May 2016
- Zhou, Yi; He, Fazhi; Qiu, Yimin
- The Journal of Supercomputing, Vol. 72, Issue 6
The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemes
journal, June 2018
- Perepelkina, Anastasia; Levchenko, Vadim; Khilkov, Sergey
- The Journal of Supercomputing, Vol. 75, Issue 12
Efficient scheduling of streams on GPGPUs
journal, February 2020
- Beheshti Roui, Mohamad; Shekofteh, S. Kazem; Noori, Hamid
- The Journal of Supercomputing, Vol. 76, Issue 11
Development of a Parallel Explicit Finite-Volume Euler Equation Solver using the Immersed Boundary Method with Hybrid MPI-CUDA Paradigm
journal, October 2019
- Kuo, F. A.; Chiang, C. H.; Lo, M. C.
- Journal of Mechanics, Vol. 36, Issue 1
High performance FDTD algorithm for GPGPU supercomputers
journal, October 2016
- Zakirov, Andrey; Levchenko, Vadim; Perepelkina, Anastasia
- Journal of Physics: Conference Series, Vol. 759
Ultrafast analysis of individual grain behavior during grain growth by parallel computing
journal, August 2015
- Kühbach, M.; Barrales-Mora, L. A.; Mießen, C.
- IOP Conference Series: Materials Science and Engineering, Vol. 89
A real-time, all-sky, high time resolution, direct imager for the long wavelength array
journal, May 2019
- Kent, James; Dowell, Jayce; Beardsley, Adam
- Monthly Notices of the Royal Astronomical Society, Vol. 486, Issue 4
Direct wide-field radio imaging in real-time at high time resolution using antenna electric fields
journal, October 2019
- Kent, James; Beardsley, Adam P.; Bester, Landman
- Monthly Notices of the Royal Astronomical Society, Vol. 491, Issue 1
Locally Recursive Non-Locally Asynchronous Algorithms for Stencil Computation
journal, May 2018
- Levchenko, V. D.; Perepelkina, A. Y.
- Lobachevskii Journal of Mathematics, Vol. 39, Issue 4
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
conference, January 2015
- Zhang, Chen; Li, Peng; Sun, Guangyu
- Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '15
Optimizing Sparse Matrix—Matrix Multiplication for the GPU
journal, October 2015
- Dalton, Steven; Olson, Luke; Bell, Nathan
- ACM Transactions on Mathematical Software, Vol. 41, Issue 4
Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications
conference, January 2015
- Wahib, Mohamed; Maruyama, Naoya
- Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15
Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model
conference, January 2015
- Stengel, Holger; Treibig, Jan; Hager, Georg
- Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results
conference, January 2015
- Hoefler, Torsten; Belli, Roberto
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
Harnessing energy efficiency of heterogeneous-ISA platforms
conference, January 2015
- Bhat, Sharath K.; Saya, Ajithchandra; Rawat, Hemedra K.
- Proceedings of the Workshop on Power-Aware Computing and Systems - HotPower '15
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
conference, January 2015
- Ardalani, Newsha; Lestourgeon, Clint; Sankaralingam, Karthikeyan
- Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48
Variation Among Processors Under Turbo Boost in HPC Systems
conference, January 2016
- Acun, Bilge; Miller, Phil; Kale, Laxmikant V.
- Proceedings of the 2016 International Conference on Supercomputing - ICS '16
Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells
journal, January 2017
- Meister, Oliver; Rahnema, Kaveh; Bader, Michael
- ACM Transactions on Mathematical Software, Vol. 43, Issue 3
Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks
conference, November 2016
- Zhang, Chen; Fang, Zhenman; Zhou, Peipei
- ICCAD '16: IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, Proceedings of the 35th International Conference on Computer-Aided Design
Resource Conscious Reuse-Driven Tiling for GPUs
conference, January 2016
- Rawat, Prashant Singh; Hong, Changwan; Ravishankar, Mahesh
- Proceedings of the 2016 International Conference on Parallel Architectures and Compilation - PACT '16
Data-Centric Computing Frontiers: A Survey On Processing-In-Memory
conference, October 2016
- Siegl, Patrick; Buchty, Rainer; Berekovic, Mladen
- MEMSYS '16: The Second International Symposium on Memory Systems, Proceedings of the Second International Symposium on Memory Systems
Sparse Matrix-Vector Multiplication on GPGPUs
journal, January 2017
- Filippone, Salvatore; Cardellini, Valeria; Barbieri, Davide
- ACM Transactions on Mathematical Software, Vol. 43, Issue 4
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
conference, January 2017
- Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio
- Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs
conference, June 2017
- Xiao, Qingcheng; Liang, Yun; Lu, Liqiang
- DAC '17: The 54th Annual Design Automation Conference 2017, Proceedings of the 54th Annual Design Automation Conference 2017
A Survey of Power and Energy Predictive Models in HPC Systems and Applications
journal, October 2017
- O’brien, Kenneth; Pietri, Ilia; Reddy, Ravi
- ACM Computing Surveys, Vol. 50, Issue 3
In-Datacenter Performance Analysis of a Tensor Processing Unit
conference, January 2017
- Jouppi, Norman P.; Borchers, Al; Boyle, Rick
- Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17
In-Datacenter Performance Analysis of a Tensor Processing Unit
journal, June 2017
- Jouppi, Norman P.; Borchers, Al; Boyle, Rick
- ACM SIGARCH Computer Architecture News, Vol. 45, Issue 2
Design of a High-Performance GEMM-like Tensor–Tensor Multiplication
journal, April 2018
- Springer, Paul; Bientinesi, Paolo
- ACM Transactions on Mathematical Software, Vol. 44, Issue 3
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
journal, July 2018
- Venieris, Stylianos I.; Kouris, Alexandros; Bouganis, Christos-Savvas
- ACM Computing Surveys, Vol. 51, Issue 3
A Survey on Compiler Autotuning using Machine Learning
journal, January 2019
- Ashouri, Amir H.; Killian, William; Cavazos, John
- ACM Computing Surveys, Vol. 51, Issue 5
Efficient sparse-matrix multi-vector product on GPUs
conference, January 2018
- Hong, Changwan; Sadayappan, P.; Sukumaran-Rajam, Aravind
- Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '18
FINN- R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
journal, December 2018
- Blott, Michaela; Preußer, Thomas B.; Fraser, Nicholas J.
- ACM Transactions on Reconfigurable Technology and Systems, Vol. 11, Issue 3
In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms
journal, April 2019
- Choi, Young-Kyu; Cong, Jason; Fang, Zhenman
- ACM Transactions on Reconfigurable Technology and Systems, Vol. 12, Issue 1
Metric Selection for GPU Kernel Classification
journal, January 2019
- Shekofteh, S. -Kazem; Noori, Hamid; Naghibzadeh, Mahmoud
- ACM Transactions on Architecture and Code Optimization, Vol. 15, Issue 4
Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators
journal, August 2019
- Kronbichler, Martin; Kormann, Katharina
- ACM Transactions on Mathematical Software, Vol. 45, Issue 3
On the Correct Measurement of Application Memory Bandwidth and Memory Access Latency
conference, January 2020
- Helm, Christian; Taura, Kenjiro
- HPCAsia2020: International Conference on High Performance Computing in Asia-Pacific Region, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC
journal, March 2019
- Lagravière, Jérémie; Langguth, Johannes; Prugger, Martina
- Scientific Programming, Vol. 2019
ExaSAT: An exascale co-design tool for performance modeling
journal, April 2014
- Unat, Didem; Chan, Cy; Zhang, Weiqun
- The International Journal of High Performance Computing Applications, Vol. 29, Issue 2
Modeling high-throughput applications for in situ analytics
journal, May 2019
- Aupy, Guillaume; Goglin, Brice; Honoré, Valentin
- The International Journal of High Performance Computing Applications, Vol. 33, Issue 6
Analytic performance modeling and analysis of detailed neuron simulations
journal, April 2020
- Cremonesi, Francesco; Hager, Georg; Wellein, Gerhard
- The International Journal of High Performance Computing Applications, Vol. 34, Issue 4
Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
journal, November 2012
- Kim, Hyesoon; Vuduc, Richard; Baghsorkhi, Sara
- Synthesis Lectures on Computer Architecture, Vol. 7, Issue 2
Data Management in Machine Learning Systems
journal, February 2019
- Boehm, Matthias; Kumar, Arun; Yang, Jun
- Synthesis Lectures on Data Management, Vol. 14, Issue 1
Lagrange-Flux Schemes: Reformulating Second-Order Accurate Lagrange-Remap Schemes for Better Node-Based HPC Performance
journal, November 2016
- De Vuyst, Florian; Gasc, Thibault; Motte, Renaud
- Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles, Vol. 71, Issue 6
Compression Challenges in Large Scale Partial Differential Equation Solvers
journal, September 2019
- Götschel, Sebastian; Weiser, Martin
- Algorithms, Vol. 12, Issue 9
DiamondTorre Algorithm for High-Performance Wave Modeling
journal, August 2016
- Levchenko, Vadim; Perepelkina, Anastasia; Zakirov, Andrey
- Computation, Vol. 4, Issue 3
An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
journal, March 2019
- Liu, Bing; Zou, Danyin; Feng, Lei
- Electronics, Vol. 8, Issue 3
Developing Efficient Discrete Simulations on Multicore and GPU Architectures
journal, January 2020
- Cagigas-Muñiz, Daniel; Diaz-del-Rio, Fernando; López-Torres, Manuel Ramón
- Electronics, Vol. 9, Issue 1
Fog vs. Cloud Computing: Should I Stay or Should I Go?
journal, February 2019
- Pisani, Flávia; Martins do Rosario, Vanderson; Borin, Edson
- Future Internet, Vol. 11, Issue 2
A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture
journal, December 2018
- Wan, Bo; Yang, Lin; Zhou, Shunping
- ISPRS International Journal of Geo-Information, Vol. 7, Issue 12
CPMIP: measurements of real computational performance of Earth system models in CMIP6
journal, January 2017
- Balaji, Venkatramani; Maisonnave, Eric; Zadeh, Niki
- Geoscientific Model Development, Vol. 10, Issue 1
Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0
journal, January 2018
- Fuhrer, Oliver; Chadha, Tarun; Hoefler, Torsten
- Geoscientific Model Development, Vol. 11, Issue 4
Portable multi- and many-core performance for finite-difference or finite-element codes – application to the free-surface component of NEMO (NEMOLite2D 1.0)
journal, January 2018
- Porter, Andrew R.; Appleyard, Jeremy; Ashworth, Mike
- Geoscientific Model Development, Vol. 11, Issue 8
Vicuna: A Timing-Predictable RISC-V Vector Coprocessor for Scalable Parallel Computation
text, January 2021
- Platzer, Michael; Puschner, Peter
- Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Co-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: a First Look at Knights Landing
text, January 2016
- Bastrakov, Sergey; Meyerov, Iosif; Gonoskov, Arkady
- Unpublished
Direct wide-field radio imaging in real-time at high time resolution using antenna electric fields
text, January 2020
- Kent, James; Beardsley, Ap; Bester, L.
- Apollo - University of Cambridge Repository
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
journal, January 2019
- Louboutin, Mathias; Lange, Michael; Luporini, Fabio
- Geoscientific Model Development, Vol. 12, Issue 3
Harnessing Energy Efficiency of Heterogeneous-ISA Platforms
journal, January 2016
- Bhat, Sharath K.; Saya, Ajithchandra; Rawat, Hemedra K.
- ACM SIGOPS Operating Systems Review, Vol. 49, Issue 2
Ultrafast analysis of individual grain behavior during grain growth by parallel computing
text, January 2015
- Kühbach, M.; Barrales-Mora, L. A.; Mießen, C.
- RWTH Aachen University
Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model
text, January 2014
- Stengel, Holger; Treibig, Jan; Hager, Georg
- arXiv
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
text, January 2015
- Kreutzer, Moritz; Thies, Jonas; Röhrig-Zöllner, Melven
- arXiv
Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing
preprint, January 2016
- Surmin, Igor; Bastrakov, Sergey; Matveev, Zakhar
- arXiv
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
text, January 2016
- Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio
- arXiv
A Survey on Compiler Autotuning using Machine Learning
text, January 2018
- Ashouri, Amir H.; Killian, William; Cavazos, John
- arXiv
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
text, January 2018
- Louboutin, Mathias; Lange, Michael; Luporini, Fabio
- arXiv
A Real-Time, All-Sky, High Time Resolution, Direct Imager for the Long Wavelength Array
text, January 2019
- Kent, James; Dowell, Jayce; Beardsley, Adam
- arXiv
Performance optimization and modeling of fine-grained irregular communication in UPC
text, January 2019
- Lagravière, Jérémie; Langguth, Johannes; Prugger, Martina
- arXiv
In situ and in-transit analysis of cosmological simulations
journal, August 2016
- Friesen, Brian; Almgren, Ann; Lukić, Zarija
- Computational Astrophysics and Cosmology, Vol. 3, Issue 1
Characterizing Task-Based OpenMP Programs
journal, April 2015
- Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats
- PLOS ONE, Vol. 10, Issue 4