Evaluating automatically parallelized versions of the support vector machine: EVALUATING AUTOMATICALLY PARALLELIZED VERSIONS OF THE SVM
|
journal
|
October 2014 |
Towards generating efficient flow solvers with the ExaStencils approach: Towards generating efficient flow solvers with the ExaStencils approach
|
journal
|
May 2017 |
Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications: Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications
- Calore, Enrico; Gabbana, Alessandro; Schifano, Sebastiano Fabio
-
Concurrency and Computation: Practice and Experience, Vol. 29, Issue 12
https://doi.org/10.1002/cpe.4143
|
journal
|
March 2017 |
An efficient low-rank Kalman filter for modern SIMD architectures: An Efficient Low-Rank Kalman Filter for Modern SIMD Architectures
|
journal
|
April 2018 |
AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL: AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL
- Coronado-Barrientos, E.; Indalecio, G.; García-Loureiro, A.
-
Concurrency and Computation: Practice and Experience, Vol. 31, Issue 1
https://doi.org/10.1002/cpe.4864
|
journal
|
July 2018 |
Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
- Carrijo Nasciutti, Thiago; Panetta, Jairo; Pais Lopes, Pedro
-
Concurrency and Computation: Practice and Experience, Vol. 31, Issue 18
https://doi.org/10.1002/cpe.4929
|
journal
|
August 2018 |
Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU: Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU
|
journal
|
September 2018 |
Design of self‐adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution
|
journal
|
August 2018 |
Roofline analysis with Cray performance analysis tools (CrayPat) and roofline‐based performance projections for a future architecture
|
journal
|
September 2018 |
High‐performance SIMD implementation of the lattice‐Boltzmann method on the Xeon Phi processor
|
journal
|
November 2018 |
Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system
|
journal
|
November 2019 |
Use of model-based architecture attributes to construct a component-level trade space
|
journal
|
February 2019 |
LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation
|
book
|
December 2018 |
Modeling and Optimizing Data Transfer in GPU-Accelerated Optical Coherence Tomography
|
book
|
December 2018 |
DSL-Based Acceleration of Automotive Environment Perception and Mapping Algorithms for Embedded CPUs, GPUs, and FPGAs
|
book
|
January 2019 |
GPU Implementation of ConeTorre Algorithm for Fluid Dynamics Simulation
- Levchenko, Vadim; Zakirov, Andrey; Perepelkina, Anastasia
-
Parallel Computing Technologies: 15th International Conference, PaCT 2019, Almaty, Kazakhstan, August 19–23, 2019, Proceedings, p. 199-213
https://doi.org/10.1007/978-3-030-25636-4_16
|
book
|
July 2019 |
LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU
- Levchenko, Vadim; Zakirov, Andrey; Perepelkina, Anastasia
-
Parallel Computational Technologies: 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2–4, 2019, Revised Selected Papers, p. 139-151
https://doi.org/10.1007/978-3-030-28163-2_10
|
book
|
August 2019 |
Optimizing Wilson-Dirac Operator and Linear Solvers for Intel® KNL
|
book
|
October 2016 |
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
|
book
|
May 2017 |
A High-Throughput Kalman Filter for Modern SIMD Architectures
|
book
|
January 2018 |
Approximate FPGA-Based LSTMs Under Computation Time Constraints
|
book
|
January 2018 |
On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors
|
book
|
January 2018 |
Software Design Space Exploration for Exascale Combustion Co-design
|
book
|
January 2013 |
How Many Threads will be too Many? On the Scalability of OpenMP Implementations
|
book
|
January 2015 |
Measuring energy consumption using EML (energy measurement library)
|
journal
|
July 2014 |
Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures
|
journal
|
November 2016 |
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems
|
journal
|
October 2016 |
Type-Driven Automated Program Transformations and Cost Modelling for Optimising Streaming Programs on FPGAs
|
journal
|
April 2018 |
3DyRM: a dynamic roofline model including memory latency information
|
journal
|
March 2014 |
Optimization of parallel iterated local search algorithms on graphics processing unit
|
journal
|
May 2016 |
The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemes
|
journal
|
June 2018 |
Efficient scheduling of streams on GPGPUs
|
journal
|
February 2020 |
Development of a Parallel Explicit Finite-Volume Euler Equation Solver using the Immersed Boundary Method with Hybrid MPI-CUDA Paradigm
|
journal
|
October 2019 |
High performance FDTD algorithm for GPGPU supercomputers
|
journal
|
October 2016 |
Ultrafast analysis of individual grain behavior during grain growth by parallel computing
|
journal
|
August 2015 |
A real-time, all-sky, high time resolution, direct imager for the long wavelength array
|
journal
|
May 2019 |
Direct wide-field radio imaging in real-time at high time resolution using antenna electric fields
|
journal
|
October 2019 |
Locally Recursive Non-Locally Asynchronous Algorithms for Stencil Computation
|
journal
|
May 2018 |
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
|
conference
|
January 2015 |
Optimizing Sparse Matrix—Matrix Multiplication for the GPU
|
journal
|
October 2015 |
Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications
|
conference
|
January 2015 |
Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model
|
conference
|
January 2015 |
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results
|
conference
|
January 2015 |
Harnessing energy efficiency of heterogeneous-ISA platforms
|
conference
|
January 2015 |
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
|
conference
|
January 2015 |
Variation Among Processors Under Turbo Boost in HPC Systems
|
conference
|
January 2016 |
Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells
|
journal
|
January 2017 |
Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks
- Zhang, Chen; Fang, Zhenman; Zhou, Peipei
-
ICCAD '16: IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, Proceedings of the 35th International Conference on Computer-Aided Design
https://doi.org/10.1145/2966986.2967011
|
conference
|
November 2016 |
Resource Conscious Reuse-Driven Tiling for GPUs
- Rawat, Prashant Singh; Hong, Changwan; Ravishankar, Mahesh
-
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation - PACT '16
https://doi.org/10.1145/2967938.2967967
|
conference
|
January 2016 |
Data-Centric Computing Frontiers: A Survey On Processing-In-Memory
- Siegl, Patrick; Buchty, Rainer; Berekovic, Mladen
-
MEMSYS '16: The Second International Symposium on Memory Systems, Proceedings of the Second International Symposium on Memory Systems
https://doi.org/10.1145/2989081.2989087
|
conference
|
October 2016 |
Sparse Matrix-Vector Multiplication on GPGPUs
|
journal
|
January 2017 |
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
- Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio
-
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17
https://doi.org/10.1145/3020078.3021744
|
conference
|
January 2017 |
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs
- Xiao, Qingcheng; Liang, Yun; Lu, Liqiang
-
DAC '17: The 54th Annual Design Automation Conference 2017, Proceedings of the 54th Annual Design Automation Conference 2017
https://doi.org/10.1145/3061639.3062244
|
conference
|
June 2017 |
A Survey of Power and Energy Predictive Models in HPC Systems and Applications
|
journal
|
October 2017 |
In-Datacenter Performance Analysis of a Tensor Processing Unit
|
conference
|
January 2017 |
In-Datacenter Performance Analysis of a Tensor Processing Unit
|
journal
|
June 2017 |
Design of a High-Performance GEMM-like Tensor–Tensor Multiplication
|
journal
|
April 2018 |
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
|
journal
|
July 2018 |
A Survey on Compiler Autotuning using Machine Learning
|
journal
|
January 2019 |
Efficient sparse-matrix multi-vector product on GPUs
- Hong, Changwan; Sadayappan, P.; Sukumaran-Rajam, Aravind
-
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '18
https://doi.org/10.1145/3208040.3208062
|
conference
|
January 2018 |
FINN- R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
- Blott, Michaela; Preußer, Thomas B.; Fraser, Nicholas J.
-
ACM Transactions on Reconfigurable Technology and Systems, Vol. 11, Issue 3
https://doi.org/10.1145/3242897
|
journal
|
December 2018 |
In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms
|
journal
|
April 2019 |
Metric Selection for GPU Kernel Classification
- Shekofteh, S. -Kazem; Noori, Hamid; Naghibzadeh, Mahmoud
-
ACM Transactions on Architecture and Code Optimization, Vol. 15, Issue 4
https://doi.org/10.1145/3295690
|
journal
|
January 2019 |
Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators
|
journal
|
August 2019 |
On the Correct Measurement of Application Memory Bandwidth and Memory Access Latency
- Helm, Christian; Taura, Kenjiro
-
HPCAsia2020: International Conference on High Performance Computing in Asia-Pacific Region, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
https://doi.org/10.1145/3368474.3368476
|
conference
|
January 2020 |
Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC
|
journal
|
March 2019 |
ExaSAT: An exascale co-design tool for performance modeling
|
journal
|
April 2014 |
Modeling high-throughput applications for in situ analytics
|
journal
|
May 2019 |
Analytic performance modeling and analysis of detailed neuron simulations
|
journal
|
April 2020 |
Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
|
journal
|
November 2012 |
Data Management in Machine Learning Systems
|
journal
|
February 2019 |
Lagrange-Flux Schemes: Reformulating Second-Order Accurate Lagrange-Remap Schemes for Better Node-Based HPC Performance
|
journal
|
November 2016 |
Compression Challenges in Large Scale Partial Differential Equation Solvers
|
journal
|
September 2019 |
DiamondTorre Algorithm for High-Performance Wave Modeling
|
journal
|
August 2016 |
An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
|
journal
|
March 2019 |
Developing Efficient Discrete Simulations on Multicore and GPU Architectures
|
journal
|
January 2020 |
Fog vs. Cloud Computing: Should I Stay or Should I Go?
|
journal
|
February 2019 |
A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture
|
journal
|
December 2018 |
CPMIP: measurements of real computational performance of Earth system models in CMIP6
|
journal
|
January 2017 |
Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0
|
journal
|
January 2018 |
Portable multi- and many-core performance for finite-difference or finite-element codes – application to the free-surface component of NEMO (NEMOLite2D 1.0)
|
journal
|
January 2018 |
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
|
posted_content
|
January 2018 |
Vicuna: A Timing-Predictable RISC-V Vector Coprocessor for Scalable Parallel Computation
|
text
|
January 2021 |
Co-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: a First Look at Knights Landing
|
text
|
January 2016 |
Direct wide-field radio imaging in real-time at high time resolution using antenna electric fields
|
text
|
January 2020 |
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
|
journal
|
January 2019 |
Harnessing Energy Efficiency of Heterogeneous-ISA Platforms
|
journal
|
January 2016 |
Ultrafast analysis of individual grain behavior during grain growth by parallel computing
|
text
|
January 2015 |
Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model
|
text
|
January 2014 |
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
|
text
|
January 2015 |
Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing
|
preprint
|
January 2016 |
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
|
text
|
January 2016 |
A Survey on Compiler Autotuning using Machine Learning
|
text
|
January 2018 |
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
|
text
|
January 2018 |
A Real-Time, All-Sky, High Time Resolution, Direct Imager for the Long Wavelength Array
|
text
|
January 2019 |
Performance optimization and modeling of fine-grained irregular communication in UPC
|
text
|
January 2019 |
In situ and in-transit analysis of cosmological simulations
|
journal
|
August 2016 |
Characterizing Task-Based OpenMP Programs
|
journal
|
April 2015 |