skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A survey of CPU-GPU heterogeneous computing techniques

Abstract

As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.

Authors:
 [1];  [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Georgia Inst. of Technology, Atlanta, GA (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1265534
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
ACM Computing Surveys
Additional Journal Information:
Journal Volume: 47; Journal Issue: 4; Journal ID: ISSN 0360-0300
Publisher:
Association for Computing Machinery (ACM)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; experimentation; management; measurement; performance; analysis; CPU-GPU heterogeneous/hybrid/collaborative computing; workload division/partitioning; dynamic/static load-balancing; pipelining; programming frameworks; fused CPU-GPU chip

Citation Formats

Mittal, Sparsh, and Vetter, Jeffrey S. A survey of CPU-GPU heterogeneous computing techniques. United States: N. p., 2015. Web. doi:10.1145/2788396.
Mittal, Sparsh, & Vetter, Jeffrey S. A survey of CPU-GPU heterogeneous computing techniques. United States. doi:10.1145/2788396.
Mittal, Sparsh, and Vetter, Jeffrey S. Sat . "A survey of CPU-GPU heterogeneous computing techniques". United States. doi:10.1145/2788396. https://www.osti.gov/servlets/purl/1265534.
@article{osti_1265534,
title = {A survey of CPU-GPU heterogeneous computing techniques},
author = {Mittal, Sparsh and Vetter, Jeffrey S.},
abstractNote = {As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.},
doi = {10.1145/2788396},
journal = {ACM Computing Surveys},
number = 4,
volume = 47,
place = {United States},
year = {2015},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 9 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Dynamic load balancing on heterogeneous multicore/multiGPU systems
conference, June 2010

  • Acosta, Alejandro; Corujo, Robert; Blanco, Vicente
  • Simulation (HPCS), 2010 International Conference on High Performance Computing & Simulation
  • DOI: 10.1109/HPCS.2010.5547097

Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction
journal, April 2012


QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
conference, May 2011

  • Agullo, Emmanuel; Augonnet, Cedric; Dongarra, Jack
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2011.90

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms
conference, September 2012

  • Albayrak, Omer Erdil; Akturk, Ismail; Ozturk, Ozcan
  • 2012 41st International Conference on Parallel Processing Workshops (ICPPW)
  • DOI: 10.1109/ICPPW.2012.14

Hybrid-parallel Algorithms for 2D Green's Functions
journal, January 2013

  • Álvarez-Melcón, Alejandro; Giménez, Domingo; Quesada, Fernando D.
  • Procedia Computer Science, Vol. 18
  • DOI: 10.1016/j.procs.2013.05.218

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms
conference, July 2011

  • Anzt, Hartwig; Heuveline, Vincent; Aliaga, Jose I.
  • 2011 International Green Computing Conference (IGCC), 2011 International Green Computing Conference and Workshops
  • DOI: 10.1109/IGCC.2011.6008594

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
journal, November 2010

  • Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond
  • Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2
  • DOI: 10.1002/cpe.1631

An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs
conference, October 2011

  • Balevic, Ana; Kienhuis, Bart
  • 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM)
  • DOI: 10.1109/DFM.2011.10

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications
conference, May 2012

  • Banerjee, Dip Sankar; Bahl, Aman Kumar; Kothapalli, Kishore
  • 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
  • DOI: 10.1109/IPDPSW.2012.212

Hybrid algorithms for list ranking and graph connected components
conference, December 2011

  • Banerjee, Dip Sankar; Kothapalli, Kishore
  • 2011 18th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2011.6152655

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
conference, January 2010

  • Becchi, Michela; Byna, Surendra; Cadambi, Srihari
  • Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures - SPAA '10
  • DOI: 10.1145/1810479.1810498

A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms
journal, August 2011


Optimization Techniques for 3D-FWT on Systems with Manycore GPUs and Multicore CPUs
journal, January 2013


Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU–GPU Systems
journal, March 2013

  • Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 4
  • DOI: 10.1021/ct301130u

Iterative SLE Solvers over a CPU-GPU Platform
conference, September 2010

  • Binotto, Alécio P. D.; Daniel, Christian; Weber, Daniel
  • 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010), 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC)
  • DOI: 10.1109/HPCC.2010.40

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms
conference, September 2011

  • Binotto, Alecio P. D.; Pereira, Carlos E.; Kuijper, Arjan
  • Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications
  • DOI: 10.1109/HPCC.2011.20

Heterogeneous Computational Model for Landform Attributes Representation on Multicore and Multi-GPU Systems
journal, January 2012


Load balancing in a changing world: dealing with heterogeneity and performance variability
conference, January 2013

  • Boyer, Michael; Skadron, Kevin; Che, Shuai
  • Proceedings of the ACM International Conference on Computing Frontiers - CF '13
  • DOI: 10.1145/2482767.2482794

AMD Fusion APU: Llano
journal, March 2012

  • Branover, Alexander; Foley, Denis; Steinman, Maurice
  • IEEE Micro, Vol. 32, Issue 2
  • DOI: 10.1109/MM.2012.2

Efficient co-processor utilization in database query processing
journal, November 2013


Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference
journal, March 2013


Rodinia: A benchmark suite for heterogeneous computing
conference, October 2009

  • Che, Shuai; Boyer, Michael; Meng, Jiayuan
  • 2009 IEEE International Symposium on Workload Characterization (IISWC)
  • DOI: 10.1109/IISWC.2009.5306797

Accelerating MapReduce on a coupled CPU-GPU architecture
conference, November 2012

  • Chen, Linchuan; Huo, Xin; Agrawal, Gagan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.16

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
journal, January 2013

  • Choi, Hong Jun; Son, Dong Oh; Kang, Seung Gu
  • The Journal of Supercomputing, Vol. 65, Issue 2
  • DOI: 10.1007/s11227-013-0870-6

GPU and APU computations of Finite Time Lyapunov Exponent fields
journal, March 2012

  • Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
  • Journal of Computational Physics, Vol. 231, Issue 5
  • DOI: 10.1016/j.jcp.2011.10.032

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
conference, July 2011

  • Daga, Mayank; Aji, Ashwin M.; Feng, Wu-chun
  • 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)
  • DOI: 10.1109/SAAHPC.2011.29

A 22nm IA multi-CPU and GPU System-on-Chip
conference, February 2012

  • Damaraju, Satish; George, Varghese; Jahagirdar, Sanjeev
  • 2012 IEEE International Solid- State Circuits Conference - (ISSCC), 2012 IEEE International Solid-State Circuits Conference
  • DOI: 10.1109/ISSCC.2012.6176876

The Scalable Heterogeneous Computing (SHOC) benchmark suite
conference, January 2010

  • Danalis, Anthony; Marin, Gabriel; McCurdy, Collin
  • Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10
  • DOI: 10.1145/1735688.1735702

Hybrid implementation of error diffusion dithering
conference, December 2011

  • Deshpande, Aditya; Misra, Ishan; Narayanan, P. J.
  • 2011 18th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2011.6152714

Harmony: an execution model and runtime for heterogeneous many core systems
conference, January 2008

  • Diamos, Gregory F.; Yalamanchili, Sudhakar
  • Proceedings of the 17th international symposium on High performance distributed computing - HPDC '08
  • DOI: 10.1145/1383422.1383447

Using graphics processors for high performance IR query processing
conference, January 2009

  • Ding, Shuai; He, Jinru; Yan, Hao
  • Proceedings of the 18th international conference on World wide web - WWW '09
  • DOI: 10.1145/1526709.1526766

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
journal, January 2011

  • Dziekonski, A.; Lamecki, A.; Mrozowski, M.
  • IEEE Antennas and Wireless Propagation Letters, Vol. 10
  • DOI: 10.1109/LAWP.2011.2159769

Linpack evaluation on a supercomputer with heterogeneous accelerators
conference, April 2010

  • Endo, Toshio; Matsuoka, Satoshi; Nukada, Akira
  • 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
  • DOI: 10.1109/IPDPS.2010.5470353

5.1 POWER8TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth
conference, February 2014

  • Fluhr, Eric J.; Friedrich, Joshua; Dreps, Daniel
  • 2014 IEEE International Solid- State Circuits Conference (ISSCC), 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)
  • DOI: 10.1109/ISSCC.2014.6757353

Mapping the sbr and Tw-Ildcs to Heterogeneous Cpu-Gpu Architecture for fast Computation of Electromagnetic Scattering
journal, January 2012

  • Gao, Peng Cheng; Tao, Yu Bo; Bai, Zhi Hui
  • Progress In Electromagnetics Research, Vol. 122
  • DOI: 10.2528/PIER11092303

Asymptotic peak Utilisation in Heterogeneous Parallel Cpu/Gpu Pipelines: a Decentralised Queue Monitoring Strategy
journal, May 2012

  • Garba, Michael T.; GonzÁLez–VÉLez, Horacio
  • Parallel Processing Letters, Vol. 22, Issue 02
  • DOI: 10.1142/S0129626412400087

An asymmetric distributed shared memory model for heterogeneous parallel systems
journal, March 2010

  • Gelado, Isaac; Cabezas, Javier; Navarro, Nacho
  • ACM SIGARCH Computer Architecture News, Vol. 38, Issue 1
  • DOI: 10.1145/1735970.1736059

A yoke of oxen and a thousand chickens for heavy lifting graph processing
conference, January 2012

  • Gharaibeh, Abdullah; Beltrão Costa, Lauro; Santos-Neto, Elizeu
  • Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
  • DOI: 10.1145/2370816.2370866

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
conference, April 2011

  • Gregg, Chris; Hazelwood, Kim
  • Software (ISPASS), (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE
  • DOI: 10.1109/ISPASS.2011.5762730

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors
conference, January 2010

  • Gummaraju, Jayanth; Morichetti, Laurent; Houston, Michael
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
  • DOI: 10.1145/1854273.1854302

Power-aware dynamic task scheduling for heterogeneous accelerated clusters
conference, May 2009

  • Hamano, Tomoaki; Endo, Toshio; Matsuoka, Satoshi
  • 2009 IEEE International Symposium on Parallel & Distributed Processing
  • DOI: 10.1109/IPDPS.2009.5160977

Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
conference, November 2010

  • Hampton, Scott S.; Alam, Sadaf R.; Crozier, Paul S.
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.37

Multilevel summation of electrostatic potentials using graphics processing units
journal, March 2009


Biomedical image analysis on a cooperative cluster of GPUs and multicores
conference, January 2008

  • Hartley, Timothy D. R.; Catalyurek, Umit; Ruiz, Antonio
  • Proceedings of the 22nd annual international conference on Supercomputing - ICS '08
  • DOI: 10.1145/1375527.1375533

Automatic dataflow application tuning for heterogeneous systems
conference, December 2010

  • Hartley, Timothy D. R.; Saule, Erik; Catalyurek, Umit V.
  • 2010 International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HIPC.2010.5713173

Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems
conference, April 2012

  • Hetherington, Tayler H.; Rogers, Timothy G.; Hsu, Lisa
  • 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)
  • DOI: 10.1109/ISPASS.2012.6189209

MapCG: writing parallel program portable between CPU and GPU
conference, January 2010

  • Hong, Chuntao; Chen, Dehao; Chen, Wenguang
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
  • DOI: 10.1145/1854273.1854303

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
conference, October 2011

  • Hong, Sungpack; Oguntebi, Tayo; Olukotun, Kunle
  • 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)
  • DOI: 10.1109/PACT.2011.14

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures
conference, July 2011

  • Horton, Mitch; Tomov, Stanimire; Dongarra, Jack
  • 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)
  • DOI: 10.1109/SAAHPC.2011.18

Scalable fast multipole methods on distributed heterogeneous architectures
conference, January 2011

  • Hu, Qi; Gumerov, Nail A.; Duraiswami, Ramani
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063432

Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system
conference, January 2012

  • Humphrey, Alan; Meng, Qingyu; Berzins, Martin
  • Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment on Bridging from the eXtreme to the campus and beyond - XSEDE '12
  • DOI: 10.1145/2335755.2335791

Porting irregular reductions on heterogeneous CPU-GPU configurations
conference, December 2011

  • Huo, Xin; Ravi, Vignesh T.; Agrawal, Gagan
  • 2011 18th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2011.6152715

Dynamically managed data for CPU-GPU architectures
conference, January 2012

  • Jablin, Thomas B.; Jablin, James A.; Prabhu, Prakash
  • Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12
  • DOI: 10.1145/2259016.2259038

Scaling Hierarchical N-body Simulations on GPU Clusters
conference, November 2010

  • Jetley, Pritish; Wesolowski, Lukasz; Gioachin, Filippo
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.49

MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
conference, May 2012

  • Jiang, Wei; Agrawal, Gagan
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2012.65

Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems
conference, July 2008

  • Joselli, Mark; Zamith, Marcelo; Clua, Esteban
  • 2008 IEEE 11th International Conference on Computational Science and Engineering (CSE), 2008 11th IEEE International Conference on Computational Science and Engineering
  • DOI: 10.1109/CSE.2008.38

SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
conference, January 2012

  • Kim, Jungwon; Seo, Sangmin; Lee, Jun
  • Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
  • DOI: 10.1145/2304576.2304623

An automatic input-sensitive approach for heterogeneous task partitioning
conference, January 2013

  • Kofler, Klaus; Grasso, Ivan; Cosenza, Biagio
  • Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13
  • DOI: 10.1145/2464996.2465007

GPU-enabled efficient executions of radiation calculations in climate modeling
conference, December 2013

  • Korwar, Sai Kiran; Vadhiyar, Sathish; Nanjundiah, Ravi S.
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • DOI: 10.1109/HiPC.2013.6799141

Dynamic Distribution of Workload between CPU and GPU for a Parallel Conjugate Gradient Method in an Adaptive FEM
journal, January 2013


Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images
journal, January 2011

  • Lecron, Fabian; Mahmoudi, Sidi Ahmed; Benjelloun, Mohammed
  • International Journal of Biomedical Imaging, Vol. 2011
  • DOI: 10.1155/2011/640208

Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids
conference, February 2012

  • Lee, Changmin; Ro, Won W.; Gaudiot, Jean-Luc
  • 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT)
  • DOI: 10.1109/INTERACT.2012.6339624

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
conference, January 2010

  • Lee, Victor W.; Hammarlund, Per; Singhal, Ronak
  • Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10
  • DOI: 10.1145/1815961.1816021

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters
journal, March 2013


An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
conference, January 2012

  • Li, Jiajia; Li, Xingjian; Tan, Guangming
  • Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
  • DOI: 10.1145/2304576.2304626

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system
conference, January 2011

  • Li, Linchuan; Li, Xingjian; Tan, Guangming
  • Proceedings of the 20th international symposium on High performance distributed computing - HPDC '11
  • DOI: 10.1145/1996130.1996157

Power-efficient time-sensitive mapping in heterogeneous systems
conference, January 2012

  • Liu, Cong; Li, Jian; Huang, Wei
  • Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
  • DOI: 10.1145/2370816.2370822

Fast Snippet Generation Based on CPU-GPU Hybrid System
conference, December 2011

  • Liu, Ding; Li, Ruixuan; Gu, Xiwu
  • 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS)
  • DOI: 10.1109/ICPADS.2011.63

A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters
conference, May 2011

  • Liu, Wenjie; Du, Zhihui; Xiao, Yu
  • Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
  • DOI: 10.1109/IPDPS.2011.129

Real-Time Non-rigid Registration of Medical Images on a Cooperative Parallel Architecture
conference, November 2009

  • Liu, Yixun; Fedorov, Andriy; Kikinis, Ron
  • 2009 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
  • DOI: 10.1109/BIBM.2009.10

CPU/GPU computing for long-wave radiation physics on large GPU clusters
journal, April 2012


Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters
journal, June 2012

  • Lu, Fengshun; Song, Junqiang; Yin, Fukang
  • Computer Physics Communications, Vol. 183, Issue 6
  • DOI: 10.1016/j.cpc.2012.01.019

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
conference, January 2009

  • Luk, Chi-Keung; Hong, Sunpyo; Kim, Hyesoon
  • Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42
  • DOI: 10.1145/1669112.1669121

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
conference, September 2012

  • Ma, Kai; Li, Xue; Chen, Wei
  • 2012 41st International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2012.31

Optimizing tensor contraction expressions for hybrid CPU-GPU execution
journal, November 2011


Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures
journal, September 2011

  • Meredith, Jeremy; Roth, Philip; Spafford, Kyle
  • IEEE Micro, Vol. 31, Issue 5
  • DOI: 10.1109/MM.2011.79

Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
conference, January 2013

  • Mistry, Perhaad; Ukidave, Yash; Schaa, Dana
  • Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units - GPGPU-6
  • DOI: 10.1145/2458523.2458529

A survey of architectural techniques for DRAM power management
journal, January 2012


A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs
journal, June 2014


A survey of techniques for improving energy efficiency in embedded computing systems
journal, January 2014

  • Mittal, Sparsh
  • International Journal of Computer Aided Engineering and Technology, Vol. 6, Issue 4
  • DOI: 10.1504/IJCAET.2014.065419

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency
journal, August 2014

  • Mittal, Sparsh; Vetter, Jeffrey S.
  • ACM Computing Surveys, Vol. 47, Issue 2
  • DOI: 10.1145/2636342

Task-based parallel breadth-first search in heterogeneous environments
conference, December 2012

  • Munguia, Lluis-Miquel; Bader, David A.; Ayguade, Eduard
  • 2012 19th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2012.6507474

Acceleration of Hessenberg Reduction for Nonsymmetric Eigenvalue Problems in a Hybrid CPU-GPU Computing Environment
journal, January 2011

  • Muramatsu, Jun-ichi; Fukaya, Takeshi; Zhang, Shao-Liang
  • International Journal of Networking and Computing, Vol. 1, Issue 2
  • DOI: 10.15803/ijnc.1.2_132

Hybrid ray tracing and path tracing of Bezier surfaces using a mixed hierarchy
conference, January 2012

  • Nigam, Rohit; Narayanan, P. J.
  • Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP '12
  • DOI: 10.1145/2425333.2425368

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing
conference, September 2012

  • Odajima, Tetsuya; Boku, Taisuke; Hanawa, Toshihiro
  • 2012 41st International Conference on Parallel Processing Workshops (ICPPW)
  • DOI: 10.1109/ICPPW.2012.16

Evaluating application performance and energy consumption on hybrid CPU+GPU architecture
journal, June 2012

  • Padoin, Edson Luiz; Pilla, Laércio Lima; Boito, Francieli Zanon
  • Cluster Computing, Vol. 16, Issue 3
  • DOI: 10.1007/s10586-012-0219-6

Combinatorial Bidirectional Path-Tracing for Efficient Hybrid CPU/GPU Rendering
journal, April 2011


Accelerating Kirchhoff Migration by CPU and GPU Cooperation
conference, October 2009

  • Panetta, J.; Teixeira, T.; de Souza Filho, P. R. P.
  • 2009 21st International Symposium on Computer Architecture and High Performance Computing. SBAC-PAD 2009
  • DOI: 10.1109/SBAC-PAD.2009.29

A new era in scientific computing: Domain decomposition methods in hybrid CPU–GPU architectures
journal, March 2011

  • Papadrakakis, M.; Stavroulakis, G.; Karatarakis, A.
  • Computer Methods in Applied Mechanics and Engineering, Vol. 200, Issue 13-16
  • DOI: 10.1016/j.cma.2011.01.013

Hybrid Core Acceleration of UWB SIRE Radar Signal Processing
journal, January 2011

  • Park, Song Jun; Ross, James; Shires, Dale
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 22, Issue 1
  • DOI: 10.1109/TPDS.2010.117

Portable performance on heterogeneous architectures
conference, January 2013

  • Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13
  • DOI: 10.1145/2451116.2451162

Automatic generation of software pipelines for heterogeneous parallel systems
conference, November 2012

  • Pienaar, Jacques A.; Chakradhar, Srimat; Raghunathan, Anand
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.22

MDR: performance model driven runtime for heterogeneous parallel platforms
conference, January 2011

  • Pienaar, Jacques A.; Raghunathan, Anand; Chakradhar, Srimat
  • Proceedings of the international conference on Supercomputing - ICS '11
  • DOI: 10.1145/1995896.1995933

X-device query processing by bitwise distribution
conference, January 2012

  • Pirk, Holger; Sellam, Thibault; Manegold, Stefan
  • Proceedings of the Eighth International Workshop on Data Management on New Hardware - DaMoN '12
  • DOI: 10.1145/2236584.2236591

DESTINY: A Tool for Modeling Emerging 3D NVM and eDRAM caches
conference, January 2015

  • Poremba, Matt; Mittal, Sparsh; Li, Dong
  • Design, Automation and Test in Europe, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
  • DOI: 10.7873/DATE.2015.0733

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
journal, June 2011

  • Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
  • ACM SIGPLAN Notices, Vol. 46, Issue 6
  • DOI: 10.1145/1993316.1993517

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
conference, November 2010

  • Rahimian, Abtin; Lashuk, Ilya; Veerapaneni, Shravan
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.42

A dynamic scheduling framework for emerging heterogeneous systems
conference, December 2011

  • Ravi, Vignesh T.; Agrawal, Gagan
  • 2011 18th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2011.6152724

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations
conference, January 2010

  • Ravi, Vignesh T.; Ma, Wenjing; Chiu, David
  • Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
  • DOI: 10.1145/1810085.1810106

Compiler and runtime support for enabling reduction computations on heterogeneous systems: REDUCTION COMPUTATIONS ON HETEROGENEOUS SYSTEMS
journal, October 2011

  • Ravi, Vignesh T.; Ma, Wenjing; Chiu, David
  • Concurrency and Computation: Practice and Experience, Vol. 24, Issue 5
  • DOI: 10.1002/cpe.1848

Programming model for a heterogeneous x86 platform
journal, May 2009


Heterogeneous Task Scheduling for Accelerated OpenMP
conference, May 2012

  • Scogland, Thomas R. W.; Rountree, Barry; Feng, Wu-chun
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2012.23

Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
conference, January 2013

  • Shen, Jie; Varbanescu, Ana Lucia; Sips, Henk
  • Proceedings of the ACM International Conference on Computing Frontiers - CF '13
  • DOI: 10.1145/2482767.2482785

Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU
journal, October 2010

  • Shen, Wenfeng; Wei, Daming; Xu, Weimin
  • Computer Methods and Programs in Biomedicine, Vol. 100, Issue 1
  • DOI: 10.1016/j.cmpb.2010.06.015

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
conference, January 2011

  • Shimokawabe, Takashi; Aoki, Takayuki; Takaki, Tomohiro
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063388

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
conference, November 2010

  • Shirahata, Koichi; Sato, Hitoshi; Matsuoka, Satoshi
  • 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on Cloud Computing Technology and Science
  • DOI: 10.1109/CloudCom.2010.55

A hybrid shared memory heterogeneous execution platform for PCIe-based GPGPUs
conference, December 2013

  • Shukla, Sambit K.; Bhuyan, Laxmi N.
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • DOI: 10.1109/HiPC.2013.6799140

Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems
conference, September 2010

  • Siegel, Jakob; Villa, Oreste; Krishnamoorthy, Sriram
  • 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS)
  • DOI: 10.1109/CLUSTERWKSP.2010.5613109

An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures
conference, January 2011

  • Silberstein, Mark; Maruyama, Naoya
  • Proceedings of the 4th Annual International Conference on Systems and Storage - SYSTOR '11
  • DOI: 10.1145/1987816.1987826

Accelerating Smith-Waterman on Heterogeneous CPU-GPU Systems
conference, May 2011

  • Singh, Jaideep; Aruni, Ipseeta
  • 2011 5th International Conference on Bioinformatics and Biomedical Engineering (iCBBE)
  • DOI: 10.1109/icbbe.2011.5780005

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
conference, January 2012

  • Spafford, Kyle L.; Meredith, Jeremy S.; Lee, Seyong
  • Proceedings of the 9th conference on Computing Frontiers - CF '12
  • DOI: 10.1145/2212908.2212924

Implementation of Fdtd-Compatible Green'S Function on Heterogeneous Cpu-Gpu Parallel Processing System
journal, January 2013


OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
journal, May 2010

  • Stone, John E.; Gohara, David; Shi, Guochun
  • Computing in Science & Engineering, Vol. 12, Issue 3, p. 66-73
  • DOI: 10.1109/MCSE.2010.69

Solving a kind of BVP for ODEs on heterogeneous CPU + CUDA-enabled GPU systems
conference, October 2010

  • Stpiczynski, Przemyslaw; Potiopa, Joanna
  • 2010 International Multiconference on Computer Science and Information Technology (IMCSIT 2010), Proceedings of the International Multiconference on Computer Science and Information Technology
  • DOI: 10.1109/IMCSIT.2010.5680041

Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems
conference, December 2013

  • Su, Yu; Ye, Ding; Xue, Jingling
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • DOI: 10.1109/HiPC.2013.6799110

Enabling task-level scheduling on heterogeneous platforms
conference, January 2012

  • Sun, Enqiang; Schaa, Dana; Bagley, Richard
  • Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5
  • DOI: 10.1145/2159430.2159440

SPRAT: Runtime processor selection for energy-aware computing
conference, September 2008

  • Takizawa, Hiroyuki; Sato, Katuto; Kobayashi, Hiroaki
  • 2008 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTR.2008.4663799

A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
conference, May 2012

  • Tan, Yu Shyang; Lee, Bu-Sung; He, Bingsheng
  • 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
  • DOI: 10.1109/CCGrid.2012.35

Coordinating the use of GPU and CPU for improving performance of compute intensive applications
conference, August 2009

  • Teodoro, George; Sachetto, Rafael; Sertel, Olcay
  • 2009 IEEE International Conference on Cluster Computing and Workshops
  • DOI: 10.1109/CLUSTR.2009.5289193

Shot boundary detection using Zernike moments in multi-GPU multi-CPU architectures
journal, September 2012

  • Toharia, Pablo; Robles, Oscar D.; Suárez, Ricardo
  • Journal of Parallel and Distributed Computing, Vol. 72, Issue 9
  • DOI: 10.1016/j.jpdc.2011.10.011

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing
journal, December 2010


Axel: a heterogeneous cluster with FPGAs and GPUs
conference, January 2010

  • Tsoi, Kuen Hung; Luk, Wayne
  • Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '10
  • DOI: 10.1145/1723112.1723134

A Technique for Collision Detection and 3D Interaction Based on Parallel GPU and CPU Processing
conference, November 2011

  • Tsuda, Fernando; Nakamura, Ricardo
  • 2011 Brazilian Symposium on Games and Digital Entertainment (SBGAMES)
  • DOI: 10.1109/SBGAMES.2011.20

Quantifying the energy efficiency of FFT on heterogeneous platforms
conference, April 2013

  • Ukidave, Yash; Ziabari, Amir Kavyan; Mistry, Perhaad
  • 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
  • DOI: 10.1109/ISPASS.2013.6557174

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
conference, January 2009

  • Venkatasubramanian, Sundaresan; Vuduc, Richard W.; none, none
  • Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09
  • DOI: 10.1145/1542275.1542312

Processing data streams with hard real-time constraints on heterogeneous systems
conference, January 2011

  • Verner, Uri; Schuster, Assaf; Silberstein, Mark
  • Proceedings of the international conference on Supercomputing - ICS '11
  • DOI: 10.1145/1995896.1995915

Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing
journal, March 2015

  • Vetter, Jeffrey S.; Mittal, Sparsh
  • Computing in Science & Engineering, Vol. 17, Issue 2
  • DOI: 10.1109/MCSE.2015.4

Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems
journal, January 2012

  • Vömel, Christof; Tomov, Stanimire; Dongarra, Jack
  • SIAM Journal on Scientific Computing, Vol. 34, Issue 2
  • DOI: 10.1137/100806783

Communication-Aware Task Partition and Voltage Scaling for Energy Minimization on Heterogeneous Parallel Systems
conference, October 2011

  • Wang, Guibin; Song, Wei
  • 2011 12th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT), 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies
  • DOI: 10.1109/PDCAT.2011.28

CPU-GPU hybrid parallel strategy for cosmological simulations: CPU-GPU HBRID PARALLEL STRATEGY FOR COSMOLOGICAL SIMULATION
journal, May 2013

  • Wang, Yueqing; Dou, Yong; Guo, Song
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 3
  • DOI: 10.1002/cpe.3046

CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
conference, January 2013

  • Wang, Zhenning; Zheng, Long; Chen, Quan
  • Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '13
  • DOI: 10.1145/2442992.2443004

Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations
conference, September 2012

  • Wen, Mei; Su, Huayou; Wei, Wenjie
  • 2012 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2012.37

A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer
conference, May 2012

  • Wu, Qiang; Yang, Canqun; Wang, Feng
  • 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
  • DOI: 10.1109/IPDPSW.2012.13

Accelerating Protein Sequence Search in a Heterogeneous Computing System
conference, May 2011

  • Xiao, Shucai; Lin, Heshan; Feng, Wu-chun
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2011.115

Discrete particle simulation of gas–solid two-phase flows with multi-scale CPU–GPU hybrid computation
journal, October 2012


Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
conference, September 2010

  • Yang, Canqun; Wang, Feng; Du, Yunfei
  • 2010 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2010.12

A peta-scalable CPU-GPU algorithm for global atmospheric simulations
conference, January 2013

  • Yang, Chao; Zheng, Weimin; Xue, Wei
  • Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13
  • DOI: 10.1145/2442516.2442518

A fully integrated multi-CPU, GPU and memory controller 32nm processor
conference, February 2011

  • Yuffe, Marcelo; Knoll, Ernest; Mehalel, Moty
  • 2011 IEEE International Solid- State Circuits Conference - (ISSCC), 2011 IEEE International Solid-State Circuits Conference
  • DOI: 10.1109/ISSCC.2011.5746311

Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications
conference, September 2012

  • Zhong, Ziming; Rychkov, Vladimir; Lastovetsky, Alexey
  • 2012 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2012.34

    Works referencing / citing this record:

    Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review
    journal, April 2018


    Optimizing parameter sensitivity analysis of large‐scale microscopy image analysis workflows with multilevel computation reuse
    journal, June 2019

    • Barreiros, Willian; Moreira, Jeremias; Kurc, Tahsin
    • Concurrency and Computation: Practice and Experience, Vol. 32, Issue 2
    • DOI: 10.1002/cpe.5403