skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A survey of CPU-GPU heterogeneous computing techniques

Journal Article · · ACM Computing Surveys
DOI:https://doi.org/10.1145/2788396· OSTI ID:1265534
 [1];  [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Georgia Inst. of Technology, Atlanta, GA (United States)

As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1265534
Journal Information:
ACM Computing Surveys, Vol. 47, Issue 4; ISSN 0360-0300
Publisher:
Association for Computing Machinery (ACM)Copyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 221 works
Citation information provided by
Web of Science

References (178)

Hybrid-parallel Algorithms for 2D Green's Functions journal January 2013
Programming model for a heterogeneous x86 platform
  • Saha, Bratin; Mendelson, Avi; Zhou, Xiaocheng
  • Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09 https://doi.org/10.1145/1542476.1542525
conference January 2009
Twin Peaks journal January 2017
Porting irregular reductions on heterogeneous CPU-GPU configurations conference December 2011
Hybrid implementation of error diffusion dithering conference December 2011
Programming model for a heterogeneous x86 platform journal May 2009
Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids conference February 2012
Discrete particle simulation of gas–solid two-phase flows with multi-scale CPU–GPU hybrid computation journal October 2012
A new era in scientific computing: Domain decomposition methods in hybrid CPU–GPU architectures journal March 2011
Processing data streams with hard real-time constraints on heterogeneous systems conference January 2011
Axel: a heterogeneous cluster with FPGAs and GPUs conference January 2010
Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system conference January 2011
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures conference September 2012
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems conference January 2009
MapCG: writing parallel program portable between CPU and GPU
  • Hong, Chuntao; Chen, Dehao; Chen, Wenguang
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10 https://doi.org/10.1145/1854273.1854303
conference January 2010
Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors
  • Gummaraju, Jayanth; Morichetti, Laurent; Houston, Michael
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10 https://doi.org/10.1145/1854273.1854302
conference January 2010
Efficient co-processor utilization in database query processing journal November 2013
A yoke of oxen and a thousand chickens for heavy lifting graph processing
  • Gharaibeh, Abdullah; Beltrão Costa, Lauro; Santos-Neto, Elizeu
  • Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12 https://doi.org/10.1145/2370816.2370866
conference January 2012
Heterogeneous Computational Model for Landform Attributes Representation on Multicore and Multi-GPU Systems journal January 2012
A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL book January 2011
Harmony: an execution model and runtime for heterogeneous many core systems conference January 2008
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing conference September 2010
A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms journal August 2011
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors journal August 2012
5.1 POWER8TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth
  • Fluhr, Eric J.; Friedrich, Joshua; Dreps, Daniel
  • 2014 IEEE International Solid- State Circuits Conference (ISSCC), 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) https://doi.org/10.1109/ISSCC.2014.6757353
conference February 2014
Accelerating Protein Sequence Search in a Heterogeneous Computing System
  • Xiao, Shucai; Lin, Heshan; Feng, Wu-chun
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.115
conference May 2011
Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms conference April 2010
An efficient, model-based CPU-GPU heterogeneous FFT library conference April 2008
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU conference January 2010
Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment book January 2007
Scalable fast multipole methods on distributed heterogeneous architectures
  • Hu, Qi; Gumerov, Nail A.; Duraiswami, Ramani
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063432
conference January 2011
MDR: performance model driven runtime for heterogeneous parallel platforms conference January 2011
Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters journal June 2012
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU journal June 2010
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
  • Hampton, Scott S.; Alam, Sadaf R.; Crozier, Paul S.
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.37
conference November 2010
A dynamic scheduling framework for emerging heterogeneous systems conference December 2011
Implementation of Fdtd-Compatible Green'S Function on Heterogeneous Cpu-Gpu Parallel Processing System journal January 2013
An asymmetric distributed shared memory model for heterogeneous parallel systems journal March 2010
A New Parallel Method of Smith-Waterman Algorithm on a Heterogeneous Platform book January 2010
Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
  • Shirahata, Koichi; Sato, Hitoshi; Matsuoka, Satoshi
  • 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on Cloud Computing Technology and Science https://doi.org/10.1109/CloudCom.2010.55
conference November 2010
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems journal January 2013
Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures journal September 2011
An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures conference January 2011
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
  • Rahimian, Abtin; Lashuk, Ilya; Veerapaneni, Shravan
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.42
conference November 2010
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms
  • Binotto, Alecio P. D.; Pereira, Carlos E.; Kuijper, Arjan
  • Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications https://doi.org/10.1109/HPCC.2011.20
conference September 2011
A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters journal March 2013
An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs conference October 2011
Heterogeneous Systems for Energy Efficient Scientific Computing book January 2012
Fluid Simulation with Two-Way Interaction Rigid Body Using a Heterogeneous GPU and CPU Environment conference November 2010
Task-based parallel breadth-first search in heterogeneous environments conference December 2012
Power-aware dynamic task scheduling for heterogeneous accelerated clusters conference May 2009
Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms
  • Anzt, Hartwig; Heuveline, Vincent; Aliaga, Jose I.
  • 2011 International Green Computing Conference (IGCC), 2011 International Green Computing Conference and Workshops https://doi.org/10.1109/IGCC.2011.6008594
conference July 2011
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures conference January 2012
A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters
  • Liu, Wenjie; Du, Zhihui; Xiao, Yu
  • Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum https://doi.org/10.1109/IPDPS.2011.129
conference May 2011
A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
  • Tan, Yu Shyang; Lee, Bu-Sung; He, Bingsheng
  • 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012) https://doi.org/10.1109/CCGrid.2012.35
conference May 2012
GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing conference September 2012
Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms conference September 2012
Iterative SLE Solvers over a CPU-GPU Platform
  • Binotto, Alécio P. D.; Daniel, Christian; Weber, Daniel
  • 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010), 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC) https://doi.org/10.1109/HPCC.2010.40
conference September 2010
Power-efficient time-sensitive mapping in heterogeneous systems conference January 2012
Predictive Runtime Code Scheduling for Heterogeneous Architectures book January 2009
Fast Snippet Generation Based on CPU-GPU Hybrid System conference December 2011
AMD Fusion APU: Llano journal March 2012
Enabling task-level scheduling on heterogeneous platforms conference January 2012
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems conference January 2013
Biomedical image analysis on a cooperative cluster of GPUs and multicores conference January 2008
Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices conference January 2014
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU conference October 2011
An automatic input-sensitive approach for heterogeneous task partitioning
  • Kofler, Klaus; Grasso, Ivan; Cosenza, Biagio
  • Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13 https://doi.org/10.1145/2464996.2465007
conference January 2013
Optimizing tensor contraction expressions for hybrid CPU-GPU execution journal November 2011
A fully integrated multi-CPU, GPU and memory controller 32nm processor
  • Yuffe, Marcelo; Knoll, Ernest; Mehalel, Moty
  • 2011 IEEE International Solid- State Circuits Conference - (ISSCC), 2011 IEEE International Solid-State Circuits Conference https://doi.org/10.1109/ISSCC.2011.5746311
conference February 2011
MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
  • Jiang, Wei; Agrawal, Gagan
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.65
conference May 2012
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
  • Agullo, Emmanuel; Augonnet, Cedric; Dongarra, Jack
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.90
conference May 2011
Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations journal January 2011
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing conference July 2011
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems conference September 2013
Mapping the sbr and Tw-Ildcs to Heterogeneous Cpu-Gpu Architecture for fast Computation of Electromagnetic Scattering journal January 2012
CPU-GPU hybrid parallel strategy for cosmological simulations: CPU-GPU HBRID PARALLEL STRATEGY FOR COSMOLOGICAL SIMULATION journal May 2013
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs conference January 2012
A peta-scalable CPU-GPU algorithm for global atmospheric simulations journal August 2013
Communication-Aware Task Partition and Voltage Scaling for Energy Minimization on Heterogeneous Parallel Systems
  • Wang, Guibin; Song, Wei
  • 2011 12th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT), 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies https://doi.org/10.1109/PDCAT.2011.28
conference October 2011
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures conference July 2011
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters conference January 2012
A survey of techniques for improving energy efficiency in embedded computing systems journal January 2014
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures book January 2009
Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations conference January 2010
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
  • Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
  • Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11 https://doi.org/10.1145/1993498.1993517
conference January 2011
Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems
  • Joselli, Mark; Zamith, Marcelo; Clua, Esteban
  • 2008 IEEE 11th International Conference on Computational Science and Engineering (CSE), 2008 11th IEEE International Conference on Computational Science and Engineering https://doi.org/10.1109/CSE.2008.38
conference July 2008
A Survey of Methods for Analyzing and Improving GPU Energy Efficiency journal August 2014
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors journal June 2011
Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms text January 2011
Accelerating Smith-Waterman on Heterogeneous CPU-GPU Systems conference May 2011
SPRAT: Runtime processor selection for energy-aware computing conference September 2008
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory conference January 2010
Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems conference September 2010
Hybrid ray tracing and path tracing of Bezier surfaces using a mixed hierarchy conference January 2012
Linpack evaluation on a supercomputer with heterogeneous accelerators conference April 2010
A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators book January 2011
CoreTSAR: Adaptive Worksharing for Heterogeneous Systems book January 2014
Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations conference September 2012
Shot boundary detection using Zernike moments in multi-GPU multi-CPU architectures journal September 2012
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing journal December 2010
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications conference September 2012
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
An asymmetric distributed shared memory model for heterogeneous parallel systems journal March 2010
Coordinating the use of GPU and CPU for improving performance of compute intensive applications conference August 2009
An On-Demand Fast Parallel Pseudo Random Number Generator with Applications
  • Banerjee, Dip Sankar; Bahl, Aman Kumar; Kothapalli, Kishore
  • 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum https://doi.org/10.1109/IPDPSW.2012.212
conference May 2012
Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing journal March 2015
Multilevel summation of electrostatic potentials using graphics processing units journal March 2009
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures journal January 2013
Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference journal March 2013
Heterogeneous Task Scheduling for Accelerated OpenMP
  • Scogland, Thomas R. W.; Rountree, Barry; Feng, Wu-chun
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.23
conference May 2012
Evaluating application performance and energy consumption on hybrid CPU+GPU architecture journal June 2012
GPU and APU computations of Finite Time Lyapunov Exponent fields journal March 2012
Portable performance on heterogeneous architectures journal April 2013
Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images journal January 2011
IBM POWER7+ design for higher frequency at fixed power journal November 2013
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
  • Shimokawabe, Takashi; Aoki, Takayuki; Takaki, Tomohiro
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063388
conference January 2011
Dynamic load balancing on heterogeneous multicore/multiGPU systems conference June 2010
A Technique for Collision Detection and 3D Interaction Based on Parallel GPU and CPU Processing conference November 2011
GPU-enabled efficient executions of radiation calculations in climate modeling
  • Korwar, Sai Kiran; Vadhiyar, Sathish; Nanjundiah, Ravi S.
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing https://doi.org/10.1109/HiPC.2013.6799141
conference December 2013
Dynamic Distribution of Workload between CPU and GPU for a Parallel Conjugate Gradient Method in an Adaptive FEM journal January 2013
Combinatorial Bidirectional Path-Tracing for Efficient Hybrid CPU/GPU Rendering journal April 2011
A hybrid shared memory heterogeneous execution platform for PCIe-based GPGPUs
  • Shukla, Sambit K.; Bhuyan, Laxmi N.
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing https://doi.org/10.1109/HiPC.2013.6799140
conference December 2013
Optimization Techniques for 3D-FWT on Systems with Manycore GPUs and Multicore CPUs journal January 2013
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems journal May 2010
A 22nm IA multi-CPU and GPU System-on-Chip
  • Damaraju, Satish; George, Varghese; Jahagirdar, Sanjeev
  • 2012 IEEE International Solid- State Circuits Conference - (ISSCC), 2012 IEEE International Solid-State Circuits Conference https://doi.org/10.1109/ISSCC.2012.6176876
conference February 2012
Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems conference April 2012
Scaling Hierarchical N-body Simulations on GPU Clusters
  • Jetley, Pritish; Wesolowski, Lukasz; Gioachin, Filippo
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.49
conference November 2010
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer conference April 2011
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
  • Wang, Zhenning; Zheng, Long; Chen, Quan
  • Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '13 https://doi.org/10.1145/2442992.2443004
conference January 2013
Compiler and runtime support for enabling reduction computations on heterogeneous systems: REDUCTION COMPUTATIONS ON HETEROGENEOUS SYSTEMS journal October 2011
Automatic dataflow application tuning for heterogeneous systems conference December 2010
A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer
  • Wu, Qiang; Yang, Canqun; Wang, Feng
  • 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum https://doi.org/10.1109/IPDPSW.2012.13
conference May 2012
Synergistic execution of stream programs on multicores with accelerators journal June 2009
X-device query processing by bitwise distribution conference January 2012
CPU/GPU computing for long-wave radiation physics on large GPU clusters journal April 2012
Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation book January 2012
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
  • Teodoro, George; Kurc, Tahsin M.; Pan, Tony
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.101
conference May 2012
Rodinia: A benchmark suite for heterogeneous computing conference October 2009
Asymptotic peak Utilisation in Heterogeneous Parallel Cpu/Gpu Pipelines: a Decentralised Queue Monitoring Strategy journal May 2012
Dynamically managed data for CPU-GPU architectures conference January 2012
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms conference January 2013
Medical Ultrasound Imaging: To GPU or Not to GPU? journal September 2011
Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters book January 2012
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems journal January 2012
The Scalable Heterogeneous Computing (SHOC) benchmark suite
  • Danalis, Anthony; Marin, Gabriel; McCurdy, Collin
  • Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10 https://doi.org/10.1145/1735688.1735702
conference January 2010
A survey of architectural techniques for DRAM power management journal January 2012
Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function book January 2010
Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU–GPU Systems journal March 2013
Practical Time Bundle Adjustment for 3D Reconstruction on the GPU book January 2012
Solving a kind of BVP for ODEs on heterogeneous CPU + CUDA-enabled GPU systems
  • Stpiczynski, Przemyslaw; Potiopa, Joanna
  • 2010 International Multiconference on Computer Science and Information Technology (IMCSIT 2010), Proceedings of the International Multiconference on Computer Science and Information Technology https://doi.org/10.1109/IMCSIT.2010.5680041
conference October 2010
Load balancing in a changing world: dealing with heterogeneity and performance variability conference January 2013
Efficient irregular wavefront propagation algorithms on hybrid CPU–GPU machines journal April 2013
A Hybrid CPU-GPU Accelerated Framework for Fast Mapping of High-Resolution Human Brain Connectome journal May 2013
Hybrid algorithms for list ranking and graph connected components conference December 2011
Maestro: Data Orchestration and Tuning for OpenCL Devices book January 2010
Using graphics processors for high performance IR query processing conference January 2009
Enhancing Cloud-Based Servers by GPU/CPU Virtualization Management book January 2013
A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs journal June 2014
Accelerating Kirchhoff Migration by CPU and GPU Cooperation conference October 2009
DESTINY: A Tool for Modeling Emerging 3D NVM and eDRAM caches
  • Poremba, Matt; Mittal, Sparsh; Li, Dong
  • Design, Automation and Test in Europe, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 https://doi.org/10.7873/DATE.2015.0733
conference January 2015
Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems
  • Su, Yu; Ye, Ding; Xue, Jingling
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing https://doi.org/10.1109/HiPC.2013.6799110
conference December 2013
Real-Time Non-rigid Registration of Medical Images on a Cooperative Parallel Architecture conference November 2009
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping conference January 2009
A peta-scalable CPU-GPU algorithm for global atmospheric simulations conference January 2013
Accelerating MapReduce on a coupled CPU-GPU architecture
  • Chen, Linchuan; Huo, Xin; Agrawal, Gagan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.16
conference November 2012
Portable performance on heterogeneous architectures
  • Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13 https://doi.org/10.1145/2451116.2451162
conference January 2013
Quantifying the energy efficiency of FFT on heterogeneous platforms conference April 2013
Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system
  • Humphrey, Alan; Meng, Qingyu; Berzins, Martin
  • Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment on Bridging from the eXtreme to the campus and beyond - XSEDE '12 https://doi.org/10.1145/2335755.2335791
conference January 2012
Biomedical image analysis on a cooperative cluster of GPUs and multicores conference January 2014
Acceleration of Hessenberg Reduction for Nonsymmetric Eigenvalue Problems in a Hybrid CPU-GPU Computing Environment journal January 2011
Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction journal April 2012
A survey of architectural techniques for improving cache power efficiency journal March 2014
أنظمة الرقابية المالية العربية وإعادة هيكلتها وفق نظام Twin Peaks journal January 2017
Hybrid Core Acceleration of UWB SIRE Radar Signal Processing journal January 2011
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU journal October 2010
Automatic generation of software pipelines for heterogeneous parallel systems
  • Pienaar, Jacques A.; Chakradhar, Srimat; Raghunathan, Anand
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.22
conference November 2012
Performance characterization of data-intensive kernels on AMD Fusion architectures journal May 2012

Cited By (28)

Artificial intelligence: a survey on evolution, models, applications and future trends journal January 2019
Crossing the chasm: how to develop weather and climate models for next generation computers? journal January 2018
Task management on fully heterogeneous micro-server system: Modeling and resolution strategies: Task management on fully heterogeneous micro-server system: Modeling and resolution strategies journal September 2018
Optimizing parameter sensitivity analysis of large‐scale microscopy image analysis workflows with multilevel computation reuse journal June 2019
Energy‐aware task scheduling with time constraint for heterogeneous cloud datacenters journal July 2019
FAST-FUSION: An Improved Accuracy Omnidirectional Visual Odometry System with Sensor Fusion and GPU Optimization for Embedded Low Cost Hardware journal December 2019
Dynamic Load Balancing Algorithm for Heterogeneous Clusters book March 2018
Implementation of a non-linear solver on heterogeneous architectures: Implementation of a non-linear solver on heterogeneous architectures journal August 2018
Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS) journal November 2019
Aspect-Oriented Set@l Language for Architecture-Independent Programming of High-Performance Computer Systems
  • Levin, Ilya I.; Dordopulo, Alexey I.; Pisarenko, Ivan V.
  • Supercomputing: 5th Russian Supercomputing Days, RuSCDays 2019, Moscow, Russia, September 23–24, 2019, Revised Selected Papers, p. 517-528 https://doi.org/10.1007/978-3-030-36592-9_42
book January 2019
Efficient Execution of Smart City’s Assets Through a Massive Parallel Computational Model
  • Ashraf, Muhammad Usman; Eassa, Fathy Alboraei; Albeshri, Aiiad Ahmad
  • Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering https://doi.org/10.1007/978-3-319-94180-6_6
book July 2018
A Heterogeneous Parallel LU Factorization Algorithm Based on a Basic Column Block Uniform Allocation Strategy journal February 2019
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles journal February 2019
Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application journal May 2018
A survey of techniques for improving efficiency of mobile web browsing journal July 2018
A Deep Pipelined Implementation of Hyperspectral Target Detection Algorithm on FPGA Using HLS journal March 2018
A Survey of Medical Imaging, Storage and Transfer Techniques book January 2019
Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL journal February 2019
A survey of techniques for architecting TLBs: A survey of techniques for architecting translation lookaside buffers journal December 2016
A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks journal April 2018
Page Locked GPGPU Rotational Visual Secret Sharing book January 2020
High-performance low-power approximate Wallace tree multiplier journal July 2018
A survey of FPGA-based accelerators for convolutional neural networks journal October 2018
Crossing the chasm: how to develop weather and climate models for next generation computers? text January 2018
Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review journal April 2018
GPU processing of theta-joins: GPU processing of theta-joins journal June 2017
A survey of techniques for architecting SLC/MLC/TLC hybrid Flash memory-based SSDs: A survey of techniques for architecting hybrid flash memory based SSDs
  • Alsalibi, Ahmed Izzat; Mittal, Sparsh; Al-Betar, Mohammed Azmi
  • Concurrency and Computation: Practice and Experience, Vol. 30, Issue 13 https://doi.org/10.1002/cpe.4420
journal January 2018
The Set@l Programming Language and Its Application for Coding Gaussian Elimination
  • Levin, Ilya I.; Dordopulo, Aleksey I.; Pisarenko, Ivan V.
  • Parallel Computational Technologies: 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2–4, 2019, Revised Selected Papers, p. 45-57 https://doi.org/10.1007/978-3-030-28163-2_4
book August 2019