skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Roofline: an insightful visual performance model for multicore architectures

Abstract

We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.

Authors:
 [1];  [1];  [1]
  1. Univ. of California, Berkeley, CA (United States). Parallel Computing Lab.
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1407073
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Communications of the ACM
Additional Journal Information:
Journal Volume: 52; Journal Issue: 4; Journal ID: ISSN 0001-0782
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Williams, Samuel, Waterman, Andrew, and Patterson, David. Roofline: an insightful visual performance model for multicore architectures. United States: N. p., 2009. Web. doi:10.1145/1498765.1498785.
Williams, Samuel, Waterman, Andrew, & Patterson, David. Roofline: an insightful visual performance model for multicore architectures. United States. doi:10.1145/1498765.1498785.
Williams, Samuel, Waterman, Andrew, and Patterson, David. Sat . "Roofline: an insightful visual performance model for multicore architectures". United States. doi:10.1145/1498765.1498785. https://www.osti.gov/servlets/purl/1407073.
@article{osti_1407073,
title = {Roofline: an insightful visual performance model for multicore architectures},
author = {Williams, Samuel and Waterman, Andrew and Patterson, David},
abstractNote = {We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.},
doi = {10.1145/1498765.1498785},
journal = {Communications of the ACM},
number = 4,
volume = 52,
place = {United States},
year = {2009},
month = {4}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 352 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Validity of the single processor approach to achieving large scale computing capabilities
conference, January 1967

  • Amdahl, Gene M.
  • Proceedings of the April 18-20, 1967, spring joint computer conference on - AFIPS '67 (Spring)
  • DOI: 10.1145/1465482.1465560

A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1
conference, January 1994

  • Boyd, E. L.; Azeem, W.; Hsien-Hsin Lee, Hsien-Hsin Lee
  • 1994 International Conference on Parallel Processing Vol. 3
  • DOI: 10.1109/ICPP.1994.30

Estimating interlock and improving balance for pipelined architectures
journal, August 1988

  • Callahan, David; Cocke, John; Kennedy, Ken
  • Journal of Parallel and Distributed Computing, Vol. 5, Issue 4
  • DOI: 10.1016/0743-7315(88)90002-0

Improving the ratio of memory operations to floating-point operations in loops
journal, November 1994

  • Carr, Steve; Kennedy, Ken
  • ACM Transactions on Programming Languages and Systems, Vol. 16, Issue 6
  • DOI: 10.1145/197320.197366

Self-Adapting Linear Algebra Algorithms and Software
journal, February 2005


Performance of Synchronized Iterative Processes in Multiprocessor Systems
journal, July 1982

  • Dubois, M.; Briggs, F. A.
  • IEEE Transactions on Software Engineering, Vol. SE-8, Issue 4
  • DOI: 10.1109/TSE.1982.235576

The Design and Implementation of FFTW3
journal, February 2005


Mapping computational concepts to GPUs
conference, January 2005


Amdahl's Law in the Multicore Era
journal, July 2008


Evaluating associativity in CPU caches
journal, January 1989

  • Hill, M. D.; Smith, A. J.
  • IEEE Transactions on Computers, Vol. 38, Issue 12
  • DOI: 10.1109/12.40842

A Proof for the Queuing Formula: L = λ W
journal, June 1961


Latency lags bandwith
journal, October 2004


A genetic algorithms approach to modeling the performance of memory-bound computations
conference, January 2007

  • Tikir, Mustafa M.; Carrington, Laura; Strohmaier, Erich
  • Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
  • DOI: 10.1145/1362622.1362686

Lattice Boltzmann simulation optimization on leading multicore platforms
conference, April 2008

  • Williams, Samuel; Carter, Jonathan; Oliker, Leonid
  • Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2008.4536295

Optimization of sparse matrix-vector multiplication on emerging multicore platforms
conference, January 2007

  • Williams, Samuel; Oliker, Leonid; Vuduc, Richard
  • Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
  • DOI: 10.1145/1362622.1362674

The SPLASH-2 programs: characterization and methodological considerations
conference, January 1995

  • Woo, Steven Cameron; Ohara, Moriyoshi; Torrie, Evan
  • Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95
  • DOI: 10.1145/223982.223990

    Works referencing / citing this record:

    The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemes
    journal, June 2018

    • Perepelkina, Anastasia; Levchenko, Vadim; Khilkov, Sergey
    • The Journal of Supercomputing, Vol. 75, Issue 12
    • DOI: 10.1007/s11227-018-2461-z

    Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
    journal, November 2012


    The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemes
    journal, June 2018

    • Perepelkina, Anastasia; Levchenko, Vadim; Khilkov, Sergey
    • The Journal of Supercomputing, Vol. 75, Issue 12
    • DOI: 10.1007/s11227-018-2461-z

    Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
    journal, November 2012


    Evaluating automatically parallelized versions of the support vector machine: EVALUATING AUTOMATICALLY PARALLELIZED VERSIONS OF THE SVM
    journal, October 2014

    • Codreanu, Valeriu; Dröge, Bob; Williams, David
    • Concurrency and Computation: Practice and Experience, Vol. 28, Issue 7
    • DOI: 10.1002/cpe.3413

    Towards generating efficient flow solvers with the ExaStencils approach: Towards generating efficient flow solvers with the ExaStencils approach
    journal, May 2017

    • Kuckuk, Sebastian; Haase, Gundolf; Vasco, Diego A.
    • Concurrency and Computation: Practice and Experience, Vol. 29, Issue 17
    • DOI: 10.1002/cpe.4062

    Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications: Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications
    journal, March 2017

    • Calore, Enrico; Gabbana, Alessandro; Schifano, Sebastiano Fabio
    • Concurrency and Computation: Practice and Experience, Vol. 29, Issue 12
    • DOI: 10.1002/cpe.4143

    An efficient low-rank Kalman filter for modern SIMD architectures: An Efficient Low-Rank Kalman Filter for Modern SIMD Architectures
    journal, April 2018

    • Cámpora Pérez, Daniel Hugo; Awile, Omar
    • Concurrency and Computation: Practice and Experience, Vol. 30, Issue 23
    • DOI: 10.1002/cpe.4483

    AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL: AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL
    journal, July 2018

    • Coronado-Barrientos, E.; Indalecio, G.; García-Loureiro, A.
    • Concurrency and Computation: Practice and Experience, Vol. 31, Issue 1
    • DOI: 10.1002/cpe.4864

    Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
    journal, August 2018

    • Carrijo Nasciutti, Thiago; Panetta, Jairo; Pais Lopes, Pedro
    • Concurrency and Computation: Practice and Experience, Vol. 31, Issue 18
    • DOI: 10.1002/cpe.4929

    Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU: Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU
    journal, September 2018

    • Yamashita, Kohei; Ito, Yasuaki; Nakano, Koji
    • Concurrency and Computation: Practice and Experience, Vol. 31, Issue 19
    • DOI: 10.1002/cpe.4947

    Design of self‐adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution
    journal, August 2018

    • Reddy Manumachu, Ravi; Lastovetsky, Alexey L.
    • Concurrency and Computation: Practice and Experience, Vol. 31, Issue 4
    • DOI: 10.1002/cpe.4958

    Roofline analysis with Cray performance analysis tools (CrayPat) and roofline‐based performance projections for a future architecture
    journal, September 2018

    • Kwack, JaeHyuk; Arnold, Galen; Mendes, Celso
    • Concurrency and Computation: Practice and Experience
    • DOI: 10.1002/cpe.4963

    High‐performance SIMD implementation of the lattice‐Boltzmann method on the Xeon Phi processor
    journal, November 2018

    • Robertsén, Fredrik; Mattila, Keijo; Westerholm, Jan
    • Concurrency and Computation: Practice and Experience, Vol. 31, Issue 13
    • DOI: 10.1002/cpe.5072

    Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system
    journal, November 2019

    • Yang, Charlene; Kurth, Thorsten; Williams, Samuel
    • Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20
    • DOI: 10.1002/cpe.5547

    Use of model-based architecture attributes to construct a component-level trade space
    journal, February 2019

    • McKean, David; Moreland, James D.; Doskey, Steven
    • Systems Engineering, Vol. 22, Issue 2
    • DOI: 10.1002/sys.21478

    Measuring energy consumption using EML (energy measurement library)
    journal, July 2014

    • Cabrera, Alberto; Almeida, Francisco; Arteaga, Javier
    • Computer Science - Research and Development, Vol. 30, Issue 2
    • DOI: 10.1007/s00450-014-0269-5

    Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures
    journal, November 2016


    GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems
    journal, October 2016

    • Kreutzer, Moritz; Thies, Jonas; Röhrig-Zöllner, Melven
    • International Journal of Parallel Programming, Vol. 45, Issue 5
    • DOI: 10.1007/s10766-016-0464-z

    Type-Driven Automated Program Transformations and Cost Modelling for Optimising Streaming Programs on FPGAs
    journal, April 2018

    • Vanderbauwhede, Wim; Nabi, Syed Waqar; Urlea, Cristian
    • International Journal of Parallel Programming, Vol. 47, Issue 1
    • DOI: 10.1007/s10766-018-0572-z

    3DyRM: a dynamic roofline model including memory latency information
    journal, March 2014

    • Lorenzo, O. G.; Pena, T. F.; Cabaleiro, J. C.
    • The Journal of Supercomputing, Vol. 70, Issue 2
    • DOI: 10.1007/s11227-014-1163-4

    Optimization of parallel iterated local search algorithms on graphics processing unit
    journal, May 2016


    Efficient scheduling of streams on GPGPUs
    journal, February 2020

    • Beheshti Roui, Mohamad; Shekofteh, S. Kazem; Noori, Hamid
    • The Journal of Supercomputing, Vol. 76, Issue 11
    • DOI: 10.1007/s11227-020-03209-x

    Development of a Parallel Explicit Finite-Volume Euler Equation Solver using the Immersed Boundary Method with Hybrid MPI-CUDA Paradigm
    journal, October 2019

    • Kuo, F. A.; Chiang, C. H.; Lo, M. C.
    • Journal of Mechanics, Vol. 36, Issue 1
    • DOI: 10.1017/jmech.2019.9

    High performance FDTD algorithm for GPGPU supercomputers
    journal, October 2016


    Ultrafast analysis of individual grain behavior during grain growth by parallel computing
    journal, August 2015

    • Kühbach, M.; Barrales-Mora, L. A.; Mießen, C.
    • IOP Conference Series: Materials Science and Engineering, Vol. 89
    • DOI: 10.1088/1757-899x/89/1/012031

    A real-time, all-sky, high time resolution, direct imager for the long wavelength array
    journal, May 2019

    • Kent, James; Dowell, Jayce; Beardsley, Adam
    • Monthly Notices of the Royal Astronomical Society, Vol. 486, Issue 4
    • DOI: 10.1093/mnras/stz1206

    Direct wide-field radio imaging in real-time at high time resolution using antenna electric fields
    journal, October 2019

    • Kent, James; Beardsley, Adam P.; Bester, Landman
    • Monthly Notices of the Royal Astronomical Society, Vol. 491, Issue 1
    • DOI: 10.1093/mnras/stz3028

    Locally Recursive Non-Locally Asynchronous Algorithms for Stencil Computation
    journal, May 2018


    Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
    conference, January 2015

    • Zhang, Chen; Li, Peng; Sun, Guangyu
    • Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '15
    • DOI: 10.1145/2684746.2689060

    Optimizing Sparse Matrix—Matrix Multiplication for the GPU
    journal, October 2015

    • Dalton, Steven; Olson, Luke; Bell, Nathan
    • ACM Transactions on Mathematical Software, Vol. 41, Issue 4
    • DOI: 10.1145/2699470

    Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications
    conference, January 2015

    • Wahib, Mohamed; Maruyama, Naoya
    • Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15
    • DOI: 10.1145/2749246.2749255

    Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model
    conference, January 2015

    • Stengel, Holger; Treibig, Jan; Hager, Georg
    • Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15
    • DOI: 10.1145/2751205.2751240

    Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results
    conference, January 2015

    • Hoefler, Torsten; Belli, Roberto
    • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
    • DOI: 10.1145/2807591.2807644

    Harnessing energy efficiency of heterogeneous-ISA platforms
    conference, January 2015

    • Bhat, Sharath K.; Saya, Ajithchandra; Rawat, Hemedra K.
    • Proceedings of the Workshop on Power-Aware Computing and Systems - HotPower '15
    • DOI: 10.1145/2818613.2818747

    Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
    conference, January 2015

    • Ardalani, Newsha; Lestourgeon, Clint; Sankaralingam, Karthikeyan
    • Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48
    • DOI: 10.1145/2830772.2830780

    Variation Among Processors Under Turbo Boost in HPC Systems
    conference, January 2016

    • Acun, Bilge; Miller, Phil; Kale, Laxmikant V.
    • Proceedings of the 2016 International Conference on Supercomputing - ICS '16
    • DOI: 10.1145/2925426.2926289

    Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells
    journal, January 2017

    • Meister, Oliver; Rahnema, Kaveh; Bader, Michael
    • ACM Transactions on Mathematical Software, Vol. 43, Issue 3
    • DOI: 10.1145/2947668

    Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks
    conference, November 2016

    • Zhang, Chen; Fang, Zhenman; Zhou, Peipei
    • ICCAD '16: IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, Proceedings of the 35th International Conference on Computer-Aided Design
    • DOI: 10.1145/2966986.2967011

    Resource Conscious Reuse-Driven Tiling for GPUs
    conference, January 2016

    • Rawat, Prashant Singh; Hong, Changwan; Ravishankar, Mahesh
    • Proceedings of the 2016 International Conference on Parallel Architectures and Compilation - PACT '16
    • DOI: 10.1145/2967938.2967967

    Data-Centric Computing Frontiers: A Survey On Processing-In-Memory
    conference, October 2016

    • Siegl, Patrick; Buchty, Rainer; Berekovic, Mladen
    • MEMSYS '16: The Second International Symposium on Memory Systems, Proceedings of the Second International Symposium on Memory Systems
    • DOI: 10.1145/2989081.2989087

    Sparse Matrix-Vector Multiplication on GPGPUs
    journal, January 2017

    • Filippone, Salvatore; Cardellini, Valeria; Barbieri, Davide
    • ACM Transactions on Mathematical Software, Vol. 43, Issue 4
    • DOI: 10.1145/3017994

    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
    conference, January 2017

    • Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio
    • Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17
    • DOI: 10.1145/3020078.3021744

    Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs
    conference, June 2017

    • Xiao, Qingcheng; Liang, Yun; Lu, Liqiang
    • DAC '17: The 54th Annual Design Automation Conference 2017, Proceedings of the 54th Annual Design Automation Conference 2017
    • DOI: 10.1145/3061639.3062244

    A Survey of Power and Energy Predictive Models in HPC Systems and Applications
    journal, October 2017

    • O’brien, Kenneth; Pietri, Ilia; Reddy, Ravi
    • ACM Computing Surveys, Vol. 50, Issue 3
    • DOI: 10.1145/3078811

    In-Datacenter Performance Analysis of a Tensor Processing Unit
    conference, January 2017

    • Jouppi, Norman P.; Borchers, Al; Boyle, Rick
    • Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17
    • DOI: 10.1145/3079856.3080246

    In-Datacenter Performance Analysis of a Tensor Processing Unit
    journal, June 2017

    • Jouppi, Norman P.; Borchers, Al; Boyle, Rick
    • ACM SIGARCH Computer Architecture News, Vol. 45, Issue 2
    • DOI: 10.1145/3140659.3080246

    Design of a High-Performance GEMM-like Tensor–Tensor Multiplication
    journal, April 2018

    • Springer, Paul; Bientinesi, Paolo
    • ACM Transactions on Mathematical Software, Vol. 44, Issue 3
    • DOI: 10.1145/3157733

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
    journal, July 2018

    • Venieris, Stylianos I.; Kouris, Alexandros; Bouganis, Christos-Savvas
    • ACM Computing Surveys, Vol. 51, Issue 3
    • DOI: 10.1145/3186332

    A Survey on Compiler Autotuning using Machine Learning
    journal, January 2019

    • Ashouri, Amir H.; Killian, William; Cavazos, John
    • ACM Computing Surveys, Vol. 51, Issue 5
    • DOI: 10.1145/3197978

    Efficient sparse-matrix multi-vector product on GPUs
    conference, January 2018

    • Hong, Changwan; Sadayappan, P.; Sukumaran-Rajam, Aravind
    • Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '18
    • DOI: 10.1145/3208040.3208062

    FINN- R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
    journal, December 2018

    • Blott, Michaela; Preußer, Thomas B.; Fraser, Nicholas J.
    • ACM Transactions on Reconfigurable Technology and Systems, Vol. 11, Issue 3
    • DOI: 10.1145/3242897

    In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms
    journal, April 2019

    • Choi, Young-Kyu; Cong, Jason; Fang, Zhenman
    • ACM Transactions on Reconfigurable Technology and Systems, Vol. 12, Issue 1
    • DOI: 10.1145/3294054

    Metric Selection for GPU Kernel Classification
    journal, January 2019

    • Shekofteh, S. -Kazem; Noori, Hamid; Naghibzadeh, Mahmoud
    • ACM Transactions on Architecture and Code Optimization, Vol. 15, Issue 4
    • DOI: 10.1145/3295690

    Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators
    journal, August 2019

    • Kronbichler, Martin; Kormann, Katharina
    • ACM Transactions on Mathematical Software, Vol. 45, Issue 3
    • DOI: 10.1145/3325864

    On the Correct Measurement of Application Memory Bandwidth and Memory Access Latency
    conference, January 2020

    • Helm, Christian; Taura, Kenjiro
    • HPCAsia2020: International Conference on High Performance Computing in Asia-Pacific Region, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
    • DOI: 10.1145/3368474.3368476

    Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC
    journal, March 2019

    • Lagravière, Jérémie; Langguth, Johannes; Prugger, Martina
    • Scientific Programming, Vol. 2019
    • DOI: 10.1155/2019/6825728

    ExaSAT: An exascale co-design tool for performance modeling
    journal, April 2014

    • Unat, Didem; Chan, Cy; Zhang, Weiqun
    • The International Journal of High Performance Computing Applications, Vol. 29, Issue 2
    • DOI: 10.1177/1094342014568690

    Modeling high-throughput applications for in situ analytics
    journal, May 2019

    • Aupy, Guillaume; Goglin, Brice; Honoré, Valentin
    • The International Journal of High Performance Computing Applications, Vol. 33, Issue 6
    • DOI: 10.1177/1094342019847263

    Analytic performance modeling and analysis of detailed neuron simulations
    journal, April 2020

    • Cremonesi, Francesco; Hager, Georg; Wellein, Gerhard
    • The International Journal of High Performance Computing Applications, Vol. 34, Issue 4
    • DOI: 10.1177/1094342020912528

    Data Management in Machine Learning Systems
    journal, February 2019


    Lagrange-Flux Schemes: Reformulating Second-Order Accurate Lagrange-Remap Schemes for Better Node-Based HPC Performance
    journal, November 2016

    • De Vuyst, Florian; Gasc, Thibault; Motte, Renaud
    • Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles, Vol. 71, Issue 6
    • DOI: 10.2516/ogst/2016019

    Compression Challenges in Large Scale Partial Differential Equation Solvers
    journal, September 2019

    • Götschel, Sebastian; Weiser, Martin
    • Algorithms, Vol. 12, Issue 9
    • DOI: 10.3390/a12090197

    DiamondTorre Algorithm for High-Performance Wave Modeling
    journal, August 2016


    An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
    journal, March 2019


    Developing Efficient Discrete Simulations on Multicore and GPU Architectures
    journal, January 2020

    • Cagigas-Muñiz, Daniel; Diaz-del-Rio, Fernando; López-Torres, Manuel Ramón
    • Electronics, Vol. 9, Issue 1
    • DOI: 10.3390/electronics9010189

    Fog vs. Cloud Computing: Should I Stay or Should I Go?
    journal, February 2019

    • Pisani, Flávia; Martins do Rosario, Vanderson; Borin, Edson
    • Future Internet, Vol. 11, Issue 2
    • DOI: 10.3390/fi11020034

    A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture
    journal, December 2018

    • Wan, Bo; Yang, Lin; Zhou, Shunping
    • ISPRS International Journal of Geo-Information, Vol. 7, Issue 12
    • DOI: 10.3390/ijgi7120472

    CPMIP: measurements of real computational performance of Earth system models in CMIP6
    journal, January 2017

    • Balaji, Venkatramani; Maisonnave, Eric; Zadeh, Niki
    • Geoscientific Model Development, Vol. 10, Issue 1
    • DOI: 10.5194/gmd-10-19-2017

    Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0
    journal, January 2018

    • Fuhrer, Oliver; Chadha, Tarun; Hoefler, Torsten
    • Geoscientific Model Development, Vol. 11, Issue 4
    • DOI: 10.5194/gmd-11-1665-2018

    Portable multi- and many-core performance for finite-difference or finite-element codes – application to the free-surface component of NEMO (NEMOLite2D 1.0)
    journal, January 2018

    • Porter, Andrew R.; Appleyard, Jeremy; Ashworth, Mike
    • Geoscientific Model Development, Vol. 11, Issue 8
    • DOI: 10.5194/gmd-11-3447-2018