DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A survey of CPU-GPU heterogeneous computing techniques

Abstract

As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.

Authors:
 [1];  [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Georgia Inst. of Technology, Atlanta, GA (United States)
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1265534
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
ACM Computing Surveys
Additional Journal Information:
Journal Volume: 47; Journal Issue: 4; Journal ID: ISSN 0360-0300
Publisher:
Association for Computing Machinery (ACM)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; experimentation; management; measurement; performance; analysis; CPU-GPU heterogeneous/hybrid/collaborative computing; workload division/partitioning; dynamic/static load-balancing; pipelining; programming frameworks; fused CPU-GPU chip

Citation Formats

Mittal, Sparsh, and Vetter, Jeffrey S. A survey of CPU-GPU heterogeneous computing techniques. United States: N. p., 2015. Web. doi:10.1145/2788396.
Mittal, Sparsh, & Vetter, Jeffrey S. A survey of CPU-GPU heterogeneous computing techniques. United States. https://doi.org/10.1145/2788396
Mittal, Sparsh, and Vetter, Jeffrey S. Sat . "A survey of CPU-GPU heterogeneous computing techniques". United States. https://doi.org/10.1145/2788396. https://www.osti.gov/servlets/purl/1265534.
@article{osti_1265534,
title = {A survey of CPU-GPU heterogeneous computing techniques},
author = {Mittal, Sparsh and Vetter, Jeffrey S.},
abstractNote = {As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.},
doi = {10.1145/2788396},
journal = {ACM Computing Surveys},
number = 4,
volume = 47,
place = {United States},
year = {Sat Jul 04 00:00:00 EDT 2015},
month = {Sat Jul 04 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 221 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Hybrid-parallel Algorithms for 2D Green's Functions
journal, January 2013

  • Álvarez-Melcón, Alejandro; Giménez, Domingo; Quesada, Fernando D.
  • Procedia Computer Science, Vol. 18
  • DOI: 10.1016/j.procs.2013.05.218

Programming model for a heterogeneous x86 platform
conference, January 2009

  • Saha, Bratin; Mendelson, Avi; Zhou, Xiaocheng
  • Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09
  • DOI: 10.1145/1542476.1542525

Twin Peaks
journal, January 2017


Porting irregular reductions on heterogeneous CPU-GPU configurations
conference, December 2011

  • Huo, Xin; Ravi, Vignesh T.; Agrawal, Gagan
  • 2011 18th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2011.6152715

Hybrid implementation of error diffusion dithering
conference, December 2011

  • Deshpande, Aditya; Misra, Ishan; Narayanan, P. J.
  • 2011 18th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2011.6152714

Programming model for a heterogeneous x86 platform
journal, May 2009


Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids
conference, February 2012

  • Lee, Changmin; Ro, Won W.; Gaudiot, Jean-Luc
  • 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT)
  • DOI: 10.1109/INTERACT.2012.6339624

Discrete particle simulation of gas–solid two-phase flows with multi-scale CPU–GPU hybrid computation
journal, October 2012


A new era in scientific computing: Domain decomposition methods in hybrid CPU–GPU architectures
journal, March 2011

  • Papadrakakis, M.; Stavroulakis, G.; Karatarakis, A.
  • Computer Methods in Applied Mechanics and Engineering, Vol. 200, Issue 13-16
  • DOI: 10.1016/j.cma.2011.01.013

Processing data streams with hard real-time constraints on heterogeneous systems
conference, January 2011

  • Verner, Uri; Schuster, Assaf; Silberstein, Mark
  • Proceedings of the international conference on Supercomputing - ICS '11
  • DOI: 10.1145/1995896.1995915

Axel: a heterogeneous cluster with FPGAs and GPUs
conference, January 2010

  • Tsoi, Kuen Hung; Luk, Wayne
  • Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '10
  • DOI: 10.1145/1723112.1723134

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system
conference, January 2011

  • Li, Linchuan; Li, Xingjian; Tan, Guangming
  • Proceedings of the 20th international symposium on High performance distributed computing - HPDC '11
  • DOI: 10.1145/1996130.1996157

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
conference, September 2012

  • Ma, Kai; Li, Xue; Chen, Wei
  • 2012 41st International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2012.31

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
conference, January 2009

  • Venkatasubramanian, Sundaresan; Vuduc, Richard W.; none, none
  • Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09
  • DOI: 10.1145/1542275.1542312

MapCG: writing parallel program portable between CPU and GPU
conference, January 2010

  • Hong, Chuntao; Chen, Dehao; Chen, Wenguang
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
  • DOI: 10.1145/1854273.1854303

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors
conference, January 2010

  • Gummaraju, Jayanth; Morichetti, Laurent; Houston, Michael
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
  • DOI: 10.1145/1854273.1854302

Efficient co-processor utilization in database query processing
journal, November 2013


A yoke of oxen and a thousand chickens for heavy lifting graph processing
conference, January 2012

  • Gharaibeh, Abdullah; Beltrão Costa, Lauro; Santos-Neto, Elizeu
  • Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
  • DOI: 10.1145/2370816.2370866

Heterogeneous Computational Model for Landform Attributes Representation on Multicore and Multi-GPU Systems
journal, January 2012


A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL
book, January 2011


Harmony: an execution model and runtime for heterogeneous many core systems
conference, January 2008

  • Diamos, Gregory F.; Yalamanchili, Sudhakar
  • Proceedings of the 17th international symposium on High performance distributed computing - HPDC '08
  • DOI: 10.1145/1383422.1383447

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
conference, September 2010

  • Yang, Canqun; Wang, Feng; Du, Yunfei
  • 2010 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2010.12

A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms
journal, August 2011


Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
journal, August 2012

  • Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
  • ACM SIGPLAN Notices, Vol. 47, Issue 6
  • DOI: 10.1145/2345156.1993517

5.1 POWER8TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth
conference, February 2014

  • Fluhr, Eric J.; Friedrich, Joshua; Dreps, Daniel
  • 2014 IEEE International Solid- State Circuits Conference (ISSCC), 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)
  • DOI: 10.1109/ISSCC.2014.6757353

Accelerating Protein Sequence Search in a Heterogeneous Computing System
conference, May 2011

  • Xiao, Shucai; Lin, Heshan; Feng, Wu-chun
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2011.115

Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms
conference, April 2010

  • He, Zhengyu; Hong, Bo
  • 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
  • DOI: 10.1109/IPDPS.2010.5470401

An efficient, model-based CPU-GPU heterogeneous FFT library
conference, April 2008

  • Ogata, Yasuhito; Endo, Toshio; Maruyama, Naoya
  • 2008 IEEE International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2008.4536163

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
conference, January 2010

  • Lee, Victor W.; Hammarlund, Per; Singhal, Ronak
  • Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10
  • DOI: 10.1145/1815961.1816021

Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment
book, January 2007

  • Ohshima, Satoshi; Kise, Kenji; Katagiri, Takahiro
  • High Performance Computing for Computational Science - VECPAR 2006. Lecture Notes in Computer Science
  • DOI: 10.1007/978-3-540-71351-7_24

Scalable fast multipole methods on distributed heterogeneous architectures
conference, January 2011

  • Hu, Qi; Gumerov, Nail A.; Duraiswami, Ramani
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063432

MDR: performance model driven runtime for heterogeneous parallel platforms
conference, January 2011

  • Pienaar, Jacques A.; Raghunathan, Anand; Chakradhar, Srimat
  • Proceedings of the international conference on Supercomputing - ICS '11
  • DOI: 10.1145/1995896.1995933

Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters
journal, June 2012

  • Lu, Fengshun; Song, Junqiang; Yin, Fukang
  • Computer Physics Communications, Vol. 183, Issue 6
  • DOI: 10.1016/j.cpc.2012.01.019

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
journal, June 2010

  • Lee, Victor W.; Hammarlund, Per; Singhal, Ronak
  • ACM SIGARCH Computer Architecture News, Vol. 38, Issue 3
  • DOI: 10.1145/1816038.1816021

Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
conference, November 2010

  • Hampton, Scott S.; Alam, Sadaf R.; Crozier, Paul S.
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.37

A dynamic scheduling framework for emerging heterogeneous systems
conference, December 2011

  • Ravi, Vignesh T.; Agrawal, Gagan
  • 2011 18th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2011.6152724

Implementation of Fdtd-Compatible Green'S Function on Heterogeneous Cpu-Gpu Parallel Processing System
journal, January 2013


An asymmetric distributed shared memory model for heterogeneous parallel systems
journal, March 2010

  • Gelado, Isaac; Stone, John E.; Cabezas, Javier
  • ACM SIGPLAN Notices, Vol. 45, Issue 3
  • DOI: 10.1145/1735971.1736059

A New Parallel Method of Smith-Waterman Algorithm on a Heterogeneous Platform
book, January 2010


Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
conference, November 2010

  • Shirahata, Koichi; Sato, Hitoshi; Matsuoka, Satoshi
  • 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on Cloud Computing Technology and Science
  • DOI: 10.1109/CloudCom.2010.55

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
journal, January 2013

  • Choi, Hong Jun; Son, Dong Oh; Kang, Seung Gu
  • The Journal of Supercomputing, Vol. 65, Issue 2
  • DOI: 10.1007/s11227-013-0870-6

Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures
journal, September 2011

  • Meredith, Jeremy; Roth, Philip; Spafford, Kyle
  • IEEE Micro, Vol. 31, Issue 5
  • DOI: 10.1109/MM.2011.79

An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures
conference, January 2011

  • Silberstein, Mark; Maruyama, Naoya
  • Proceedings of the 4th Annual International Conference on Systems and Storage - SYSTOR '11
  • DOI: 10.1145/1987816.1987826

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
conference, November 2010

  • Rahimian, Abtin; Lashuk, Ilya; Veerapaneni, Shravan
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.42

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms
conference, September 2011

  • Binotto, Alecio P. D.; Pereira, Carlos E.; Kuijper, Arjan
  • Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications
  • DOI: 10.1109/HPCC.2011.20

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters
journal, March 2013


An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs
conference, October 2011

  • Balevic, Ana; Kienhuis, Bart
  • 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM)
  • DOI: 10.1109/DFM.2011.10

Heterogeneous Systems for Energy Efficient Scientific Computing
book, January 2012

  • Liu, Qiang; Luk, Wayne
  • Reconfigurable Computing: Architectures, Tools and Applications. Lecture Notes in Computer Science
  • DOI: 10.1007/978-3-642-28365-9_6

Fluid Simulation with Two-Way Interaction Rigid Body Using a Heterogeneous GPU and CPU Environment
conference, November 2010

  • Junior, José Ricardo da S.; Clua, Esteban W.; Montenegro, Anselmo
  • 2010 Brazilian Symposium on Games and Digital Entertainment (SBGAMES)
  • DOI: 10.1109/SBGAMES.2010.25

Task-based parallel breadth-first search in heterogeneous environments
conference, December 2012

  • Munguia, Lluis-Miquel; Bader, David A.; Ayguade, Eduard
  • 2012 19th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2012.6507474

Power-aware dynamic task scheduling for heterogeneous accelerated clusters
conference, May 2009

  • Hamano, Tomoaki; Endo, Toshio; Matsuoka, Satoshi
  • 2009 IEEE International Symposium on Parallel & Distributed Processing
  • DOI: 10.1109/IPDPS.2009.5160977

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms
conference, July 2011

  • Anzt, Hartwig; Heuveline, Vincent; Aliaga, Jose I.
  • 2011 International Green Computing Conference (IGCC), 2011 International Green Computing Conference and Workshops
  • DOI: 10.1109/IGCC.2011.6008594

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
conference, January 2012

  • Spafford, Kyle L.; Meredith, Jeremy S.; Lee, Seyong
  • Proceedings of the 9th conference on Computing Frontiers - CF '12
  • DOI: 10.1145/2212908.2212924

A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters
conference, May 2011

  • Liu, Wenjie; Du, Zhihui; Xiao, Yu
  • Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
  • DOI: 10.1109/IPDPS.2011.129

A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
conference, May 2012

  • Tan, Yu Shyang; Lee, Bu-Sung; He, Bingsheng
  • 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
  • DOI: 10.1109/CCGrid.2012.35

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing
conference, September 2012

  • Odajima, Tetsuya; Boku, Taisuke; Hanawa, Toshihiro
  • 2012 41st International Conference on Parallel Processing Workshops (ICPPW)
  • DOI: 10.1109/ICPPW.2012.16

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms
conference, September 2012

  • Albayrak, Omer Erdil; Akturk, Ismail; Ozturk, Ozcan
  • 2012 41st International Conference on Parallel Processing Workshops (ICPPW)
  • DOI: 10.1109/ICPPW.2012.14

Iterative SLE Solvers over a CPU-GPU Platform
conference, September 2010

  • Binotto, Alécio P. D.; Daniel, Christian; Weber, Daniel
  • 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010), 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC)
  • DOI: 10.1109/HPCC.2010.40

Power-efficient time-sensitive mapping in heterogeneous systems
conference, January 2012

  • Liu, Cong; Li, Jian; Huang, Wei
  • Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
  • DOI: 10.1145/2370816.2370822

Predictive Runtime Code Scheduling for Heterogeneous Architectures
book, January 2009

  • Jiménez, Víctor J.; Vilanova, Lluís; Gelado, Isaac
  • High Performance Embedded Architectures and Compilers
  • DOI: 10.1007/978-3-540-92990-1_4

Fast Snippet Generation Based on CPU-GPU Hybrid System
conference, December 2011

  • Liu, Ding; Li, Ruixuan; Gu, Xiwu
  • 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS)
  • DOI: 10.1109/ICPADS.2011.63

AMD Fusion APU: Llano
journal, March 2012

  • Branover, Alexander; Foley, Denis; Steinman, Maurice
  • IEEE Micro, Vol. 32, Issue 2
  • DOI: 10.1109/MM.2012.2

Enabling task-level scheduling on heterogeneous platforms
conference, January 2012

  • Sun, Enqiang; Schaa, Dana; Bagley, Richard
  • Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5
  • DOI: 10.1145/2159430.2159440

Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
conference, January 2013

  • Mistry, Perhaad; Ukidave, Yash; Schaa, Dana
  • Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units - GPGPU-6
  • DOI: 10.1145/2458523.2458529

Biomedical image analysis on a cooperative cluster of GPUs and multicores
conference, January 2008

  • Hartley, Timothy D. R.; Catalyurek, Umit; Ruiz, Antonio
  • Proceedings of the 22nd annual international conference on Supercomputing - ICS '08
  • DOI: 10.1145/1375527.1375533

Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
conference, January 2014

  • Pandit, Prasanna; Govindarajan, R.
  • Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization - CGO '14
  • DOI: 10.1145/2581122.2544163

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
conference, October 2011

  • Hong, Sungpack; Oguntebi, Tayo; Olukotun, Kunle
  • 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)
  • DOI: 10.1109/PACT.2011.14

An automatic input-sensitive approach for heterogeneous task partitioning
conference, January 2013

  • Kofler, Klaus; Grasso, Ivan; Cosenza, Biagio
  • Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13
  • DOI: 10.1145/2464996.2465007

Optimizing tensor contraction expressions for hybrid CPU-GPU execution
journal, November 2011


A fully integrated multi-CPU, GPU and memory controller 32nm processor
conference, February 2011

  • Yuffe, Marcelo; Knoll, Ernest; Mehalel, Moty
  • 2011 IEEE International Solid- State Circuits Conference - (ISSCC), 2011 IEEE International Solid-State Circuits Conference
  • DOI: 10.1109/ISSCC.2011.5746311

MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
conference, May 2012

  • Jiang, Wei; Agrawal, Gagan
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2012.65

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
conference, May 2011

  • Agullo, Emmanuel; Augonnet, Cedric; Dongarra, Jack
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2011.90

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
journal, January 2011

  • Dziekonski, A.; Lamecki, A.; Mrozowski, M.
  • IEEE Antennas and Wireless Propagation Letters, Vol. 10
  • DOI: 10.1109/LAWP.2011.2159769

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
conference, July 2011

  • Daga, Mayank; Aji, Ashwin M.; Feng, Wu-chun
  • 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)
  • DOI: 10.1109/SAAHPC.2011.29

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
conference, September 2013

  • Lee, Janghaeng; Samadi, Mehrzad; Park, Yongjun
  • Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques
  • DOI: 10.1109/PACT.2013.6618821

Mapping the sbr and Tw-Ildcs to Heterogeneous Cpu-Gpu Architecture for fast Computation of Electromagnetic Scattering
journal, January 2012

  • Gao, Peng Cheng; Tao, Yu Bo; Bai, Zhi Hui
  • Progress In Electromagnetics Research, Vol. 122
  • DOI: 10.2528/PIER11092303

CPU-GPU hybrid parallel strategy for cosmological simulations: CPU-GPU HBRID PARALLEL STRATEGY FOR COSMOLOGICAL SIMULATION
journal, May 2013

  • Wang, Yueqing; Dou, Yong; Guo, Song
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 3
  • DOI: 10.1002/cpe.3046

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
conference, January 2012

  • Li, Jiajia; Li, Xingjian; Tan, Guangming
  • Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
  • DOI: 10.1145/2304576.2304626

A peta-scalable CPU-GPU algorithm for global atmospheric simulations
journal, August 2013


Communication-Aware Task Partition and Voltage Scaling for Energy Minimization on Heterogeneous Parallel Systems
conference, October 2011

  • Wang, Guibin; Song, Wei
  • 2011 12th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT), 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies
  • DOI: 10.1109/PDCAT.2011.28

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures
conference, July 2011

  • Horton, Mitch; Tomov, Stanimire; Dongarra, Jack
  • 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)
  • DOI: 10.1109/SAAHPC.2011.18

SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
conference, January 2012

  • Kim, Jungwon; Seo, Sangmin; Lee, Jun
  • Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
  • DOI: 10.1145/2304576.2304623

A survey of techniques for improving energy efficiency in embedded computing systems
journal, January 2014

  • Mittal, Sparsh
  • International Journal of Computer Aided Engineering and Technology, Vol. 6, Issue 4
  • DOI: 10.1504/IJCAET.2014.065419

A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
book, January 2009


Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations
conference, January 2010

  • Ravi, Vignesh T.; Ma, Wenjing; Chiu, David
  • Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
  • DOI: 10.1145/1810085.1810106

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
conference, January 2011

  • Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
  • Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11
  • DOI: 10.1145/1993498.1993517

Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems
conference, July 2008

  • Joselli, Mark; Zamith, Marcelo; Clua, Esteban
  • 2008 IEEE 11th International Conference on Computational Science and Engineering (CSE), 2008 11th IEEE International Conference on Computational Science and Engineering
  • DOI: 10.1109/CSE.2008.38

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency
journal, August 2014

  • Mittal, Sparsh; Vetter, Jeffrey S.
  • ACM Computing Surveys, Vol. 47, Issue 2
  • DOI: 10.1145/2636342

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
journal, June 2011

  • Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
  • ACM SIGPLAN Notices, Vol. 46, Issue 6
  • DOI: 10.1145/1993316.1993517

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms
text, January 2011

  • Anzt, Hartwig; Heuveline, Vincent; Aliaga, José I.
  • Karlsruher Institut für Technologie (KIT)
  • DOI: 10.5445/ir/1000023438

Accelerating Smith-Waterman on Heterogeneous CPU-GPU Systems
conference, May 2011

  • Singh, Jaideep; Aruni, Ipseeta
  • 2011 5th International Conference on Bioinformatics and Biomedical Engineering (iCBBE)
  • DOI: 10.1109/icbbe.2011.5780005

SPRAT: Runtime processor selection for energy-aware computing
conference, September 2008

  • Takizawa, Hiroyuki; Sato, Katuto; Kobayashi, Hiroaki
  • 2008 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTR.2008.4663799

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
conference, January 2010

  • Becchi, Michela; Byna, Surendra; Cadambi, Srihari
  • Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures - SPAA '10
  • DOI: 10.1145/1810479.1810498

Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems
conference, September 2010

  • Siegel, Jakob; Villa, Oreste; Krishnamoorthy, Sriram
  • 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS)
  • DOI: 10.1109/CLUSTERWKSP.2010.5613109

Hybrid ray tracing and path tracing of Bezier surfaces using a mixed hierarchy
conference, January 2012

  • Nigam, Rohit; Narayanan, P. J.
  • Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP '12
  • DOI: 10.1145/2425333.2425368

Linpack evaluation on a supercomputer with heterogeneous accelerators
conference, April 2010

  • Endo, Toshio; Matsuoka, Satoshi; Nukada, Akira
  • 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
  • DOI: 10.1109/IPDPS.2010.5470353

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
book, January 2011

  • Ltaief, Hatem; Tomov, Stanimire; Nath, Rajib
  • High Performance Computing for Computational Science – VECPAR 2010. Lecture Notes in Computer Science
  • DOI: 10.1007/978-3-642-19328-6_11

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems
book, January 2014

  • Scogland, Thomas R. W.; Feng, Wu-chun; Rountree, Barry
  • Supercomputing. Lecture Notes in Computer Science
  • DOI: 10.1007/978-3-319-07518-1_11

Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations
conference, September 2012

  • Wen, Mei; Su, Huayou; Wei, Wenjie
  • 2012 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2012.37

Shot boundary detection using Zernike moments in multi-GPU multi-CPU architectures
journal, September 2012

  • Toharia, Pablo; Robles, Oscar D.; Suárez, Ricardo
  • Journal of Parallel and Distributed Computing, Vol. 72, Issue 9
  • DOI: 10.1016/j.jpdc.2011.10.011

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing
journal, December 2010


Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications
conference, September 2012

  • Zhong, Ziming; Rychkov, Vladimir; Lastovetsky, Alexey
  • 2012 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2012.34

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
journal, November 2010

  • Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond
  • Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2
  • DOI: 10.1002/cpe.1631

An asymmetric distributed shared memory model for heterogeneous parallel systems
journal, March 2010

  • Gelado, Isaac; Cabezas, Javier; Navarro, Nacho
  • ACM SIGARCH Computer Architecture News, Vol. 38, Issue 1
  • DOI: 10.1145/1735970.1736059

Coordinating the use of GPU and CPU for improving performance of compute intensive applications
conference, August 2009

  • Teodoro, George; Sachetto, Rafael; Sertel, Olcay
  • 2009 IEEE International Conference on Cluster Computing and Workshops
  • DOI: 10.1109/CLUSTR.2009.5289193

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications
conference, May 2012

  • Banerjee, Dip Sankar; Bahl, Aman Kumar; Kothapalli, Kishore
  • 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
  • DOI: 10.1109/IPDPSW.2012.212

Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing
journal, March 2015

  • Vetter, Jeffrey S.; Mittal, Sparsh
  • Computing in Science & Engineering, Vol. 17, Issue 2
  • DOI: 10.1109/MCSE.2015.4

Multilevel summation of electrostatic potentials using graphics processing units
journal, March 2009


A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
journal, January 2013

  • Belviranli, Mehmet E.; Bhuyan, Laxmi N.; Gupta, Rajiv
  • ACM Transactions on Architecture and Code Optimization, Vol. 9, Issue 4
  • DOI: 10.1145/2400682.2400716

Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference
journal, March 2013


Heterogeneous Task Scheduling for Accelerated OpenMP
conference, May 2012

  • Scogland, Thomas R. W.; Rountree, Barry; Feng, Wu-chun
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2012.23

Evaluating application performance and energy consumption on hybrid CPU+GPU architecture
journal, June 2012

  • Padoin, Edson Luiz; Pilla, Laércio Lima; Boito, Francieli Zanon
  • Cluster Computing, Vol. 16, Issue 3
  • DOI: 10.1007/s10586-012-0219-6

GPU and APU computations of Finite Time Lyapunov Exponent fields
journal, March 2012

  • Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
  • Journal of Computational Physics, Vol. 231, Issue 5
  • DOI: 10.1016/j.jcp.2011.10.032

Portable performance on heterogeneous architectures
journal, April 2013

  • Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
  • ACM SIGPLAN Notices, Vol. 48, Issue 4
  • DOI: 10.1145/2499368.2451162

Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images
journal, January 2011

  • Lecron, Fabian; Mahmoudi, Sidi Ahmed; Benjelloun, Mohammed
  • International Journal of Biomedical Imaging, Vol. 2011
  • DOI: 10.1155/2011/640208

IBM POWER7+ design for higher frequency at fixed power
journal, November 2013

  • Zyuban, V.; Taylor, S. A.; Christensen, B.
  • IBM Journal of Research and Development, Vol. 57, Issue 6
  • DOI: 10.1147/JRD.2013.2279597

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
conference, January 2011

  • Shimokawabe, Takashi; Aoki, Takayuki; Takaki, Tomohiro
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063388

Dynamic load balancing on heterogeneous multicore/multiGPU systems
conference, June 2010

  • Acosta, Alejandro; Corujo, Robert; Blanco, Vicente
  • Simulation (HPCS), 2010 International Conference on High Performance Computing & Simulation
  • DOI: 10.1109/HPCS.2010.5547097

A Technique for Collision Detection and 3D Interaction Based on Parallel GPU and CPU Processing
conference, November 2011

  • Tsuda, Fernando; Nakamura, Ricardo
  • 2011 Brazilian Symposium on Games and Digital Entertainment (SBGAMES)
  • DOI: 10.1109/SBGAMES.2011.20

GPU-enabled efficient executions of radiation calculations in climate modeling
conference, December 2013

  • Korwar, Sai Kiran; Vadhiyar, Sathish; Nanjundiah, Ravi S.
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • DOI: 10.1109/HiPC.2013.6799141

Dynamic Distribution of Workload between CPU and GPU for a Parallel Conjugate Gradient Method in an Adaptive FEM
journal, January 2013


Combinatorial Bidirectional Path-Tracing for Efficient Hybrid CPU/GPU Rendering
journal, April 2011


A hybrid shared memory heterogeneous execution platform for PCIe-based GPGPUs
conference, December 2013

  • Shukla, Sambit K.; Bhuyan, Laxmi N.
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • DOI: 10.1109/HiPC.2013.6799140

Optimization Techniques for 3D-FWT on Systems with Manycore GPUs and Multicore CPUs
journal, January 2013


OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
journal, May 2010

  • Stone, John E.; Gohara, David; Shi, Guochun
  • Computing in Science & Engineering, Vol. 12, Issue 3, p. 66-73
  • DOI: 10.1109/MCSE.2010.69

A 22nm IA multi-CPU and GPU System-on-Chip
conference, February 2012

  • Damaraju, Satish; George, Varghese; Jahagirdar, Sanjeev
  • 2012 IEEE International Solid- State Circuits Conference - (ISSCC), 2012 IEEE International Solid-State Circuits Conference
  • DOI: 10.1109/ISSCC.2012.6176876

Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems
conference, April 2012

  • Hetherington, Tayler H.; Rogers, Timothy G.; Hsu, Lisa
  • 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)
  • DOI: 10.1109/ISPASS.2012.6189209

Scaling Hierarchical N-body Simulations on GPU Clusters
conference, November 2010

  • Jetley, Pritish; Wesolowski, Lukasz; Gioachin, Filippo
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.49

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
conference, April 2011

  • Gregg, Chris; Hazelwood, Kim
  • Software (ISPASS), (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE
  • DOI: 10.1109/ISPASS.2011.5762730

CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
conference, January 2013

  • Wang, Zhenning; Zheng, Long; Chen, Quan
  • Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '13
  • DOI: 10.1145/2442992.2443004

Compiler and runtime support for enabling reduction computations on heterogeneous systems: REDUCTION COMPUTATIONS ON HETEROGENEOUS SYSTEMS
journal, October 2011

  • Ravi, Vignesh T.; Ma, Wenjing; Chiu, David
  • Concurrency and Computation: Practice and Experience, Vol. 24, Issue 5
  • DOI: 10.1002/cpe.1848

Automatic dataflow application tuning for heterogeneous systems
conference, December 2010

  • Hartley, Timothy D. R.; Saule, Erik; Catalyurek, Umit V.
  • 2010 International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HIPC.2010.5713173

A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer
conference, May 2012

  • Wu, Qiang; Yang, Canqun; Wang, Feng
  • 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
  • DOI: 10.1109/IPDPSW.2012.13

Synergistic execution of stream programs on multicores with accelerators
journal, June 2009

  • Udupa, Abhishek; Govindarajan, R.; Thazhuthaveetil, Matthew J.
  • ACM SIGPLAN Notices, Vol. 44, Issue 7
  • DOI: 10.1145/1543136.1542466

X-device query processing by bitwise distribution
conference, January 2012

  • Pirk, Holger; Sellam, Thibault; Manegold, Stefan
  • Proceedings of the Eighth International Workshop on Data Management on New Hardware - DaMoN '12
  • DOI: 10.1145/2236584.2236591

CPU/GPU computing for long-wave radiation physics on large GPU clusters
journal, April 2012


Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation
book, January 2012

  • Muraraşu, Alin; Weidendorfer, Josef; Bode, Arndt
  • Euro-Par 2011: Parallel Processing Workshops. Lecture Notes in Computer Science
  • DOI: 10.1007/978-3-642-29740-3_39

Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
conference, May 2012

  • Teodoro, George; Kurc, Tahsin M.; Pan, Tony
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2012.101

Rodinia: A benchmark suite for heterogeneous computing
conference, October 2009

  • Che, Shuai; Boyer, Michael; Meng, Jiayuan
  • 2009 IEEE International Symposium on Workload Characterization (IISWC)
  • DOI: 10.1109/IISWC.2009.5306797

Asymptotic peak Utilisation in Heterogeneous Parallel Cpu/Gpu Pipelines: a Decentralised Queue Monitoring Strategy
journal, May 2012

  • Garba, Michael T.; GonzÁLez–VÉLez, Horacio
  • Parallel Processing Letters, Vol. 22, Issue 02
  • DOI: 10.1142/S0129626412400087

Dynamically managed data for CPU-GPU architectures
conference, January 2012

  • Jablin, Thomas B.; Jablin, James A.; Prabhu, Prakash
  • Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12
  • DOI: 10.1145/2259016.2259038

Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
conference, January 2013

  • Shen, Jie; Varbanescu, Ana Lucia; Sips, Henk
  • Proceedings of the ACM International Conference on Computing Frontiers - CF '13
  • DOI: 10.1145/2482767.2482785

Medical Ultrasound Imaging: To GPU or Not to GPU?
journal, September 2011

  • So, Hayden; Chen, Junying; Yiu, Billy
  • IEEE Micro, Vol. 31, Issue 5
  • DOI: 10.1109/MM.2011.65

Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters
book, January 2012


Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems
journal, January 2012

  • Vömel, Christof; Tomov, Stanimire; Dongarra, Jack
  • SIAM Journal on Scientific Computing, Vol. 34, Issue 2
  • DOI: 10.1137/100806783

The Scalable Heterogeneous Computing (SHOC) benchmark suite
conference, January 2010

  • Danalis, Anthony; Marin, Gabriel; McCurdy, Collin
  • Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10
  • DOI: 10.1145/1735688.1735702

A survey of architectural techniques for DRAM power management
journal, January 2012


Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function
book, January 2010


Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU–GPU Systems
journal, March 2013

  • Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 4
  • DOI: 10.1021/ct301130u

Practical Time Bundle Adjustment for 3D Reconstruction on the GPU
book, January 2012


Solving a kind of BVP for ODEs on heterogeneous CPU + CUDA-enabled GPU systems
conference, October 2010

  • Stpiczynski, Przemyslaw; Potiopa, Joanna
  • 2010 International Multiconference on Computer Science and Information Technology (IMCSIT 2010), Proceedings of the International Multiconference on Computer Science and Information Technology
  • DOI: 10.1109/IMCSIT.2010.5680041

Load balancing in a changing world: dealing with heterogeneity and performance variability
conference, January 2013

  • Boyer, Michael; Skadron, Kevin; Che, Shuai
  • Proceedings of the ACM International Conference on Computing Frontiers - CF '13
  • DOI: 10.1145/2482767.2482794

Efficient irregular wavefront propagation algorithms on hybrid CPU–GPU machines
journal, April 2013


A Hybrid CPU-GPU Accelerated Framework for Fast Mapping of High-Resolution Human Brain Connectome
journal, May 2013


Hybrid algorithms for list ranking and graph connected components
conference, December 2011

  • Banerjee, Dip Sankar; Kothapalli, Kishore
  • 2011 18th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2011.6152655

Maestro: Data Orchestration and Tuning for OpenCL Devices
book, January 2010


Using graphics processors for high performance IR query processing
conference, January 2009

  • Ding, Shuai; He, Jinru; Yan, Hao
  • Proceedings of the 18th international conference on World wide web - WWW '09
  • DOI: 10.1145/1526709.1526766

Enhancing Cloud-Based Servers by GPU/CPU Virtualization Management
book, January 2013

  • Wu, Tin-Yu; Lee, Wei-Tsong; Duan, Chien-Yu
  • Advances in Intelligent Systems and Applications. Smart Innovation, Systems and Technologies
  • DOI: 10.1007/978-3-642-35473-1_20

A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs
journal, June 2014


Accelerating Kirchhoff Migration by CPU and GPU Cooperation
conference, October 2009

  • Panetta, J.; Teixeira, T.; de Souza Filho, P. R. P.
  • 2009 21st International Symposium on Computer Architecture and High Performance Computing. SBAC-PAD 2009
  • DOI: 10.1109/SBAC-PAD.2009.29

DESTINY: A Tool for Modeling Emerging 3D NVM and eDRAM caches
conference, January 2015

  • Poremba, Matt; Mittal, Sparsh; Li, Dong
  • Design, Automation and Test in Europe, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
  • DOI: 10.7873/DATE.2015.0733

Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems
conference, December 2013

  • Su, Yu; Ye, Ding; Xue, Jingling
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • DOI: 10.1109/HiPC.2013.6799110

Real-Time Non-rigid Registration of Medical Images on a Cooperative Parallel Architecture
conference, November 2009

  • Liu, Yixun; Fedorov, Andriy; Kikinis, Ron
  • 2009 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
  • DOI: 10.1109/BIBM.2009.10

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
conference, January 2009

  • Luk, Chi-Keung; Hong, Sunpyo; Kim, Hyesoon
  • Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42
  • DOI: 10.1145/1669112.1669121

A peta-scalable CPU-GPU algorithm for global atmospheric simulations
conference, January 2013

  • Yang, Chao; Zheng, Weimin; Xue, Wei
  • Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13
  • DOI: 10.1145/2442516.2442518

Accelerating MapReduce on a coupled CPU-GPU architecture
conference, November 2012

  • Chen, Linchuan; Huo, Xin; Agrawal, Gagan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.16

Portable performance on heterogeneous architectures
conference, January 2013

  • Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13
  • DOI: 10.1145/2451116.2451162

Quantifying the energy efficiency of FFT on heterogeneous platforms
conference, April 2013

  • Ukidave, Yash; Ziabari, Amir Kavyan; Mistry, Perhaad
  • 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
  • DOI: 10.1109/ISPASS.2013.6557174

Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system
conference, January 2012

  • Humphrey, Alan; Meng, Qingyu; Berzins, Martin
  • Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment on Bridging from the eXtreme to the campus and beyond - XSEDE '12
  • DOI: 10.1145/2335755.2335791

Biomedical image analysis on a cooperative cluster of GPUs and multicores
conference, January 2014

  • Hartley, Timothy D. R.; Catalyurek, Umit; Ruiz, Antonio
  • 25th Anniversary International Conference on Supercomputing Anniversary Volume -
  • DOI: 10.1145/2591635.2667189

Acceleration of Hessenberg Reduction for Nonsymmetric Eigenvalue Problems in a Hybrid CPU-GPU Computing Environment
journal, January 2011

  • Muramatsu, Jun-ichi; Fukaya, Takeshi; Zhang, Shao-Liang
  • International Journal of Networking and Computing, Vol. 1, Issue 2
  • DOI: 10.15803/ijnc.1.2_132

Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction
journal, April 2012


A survey of architectural techniques for improving cache power efficiency
journal, March 2014


أنظمة الرقابية المالية العربية وإعادة هيكلتها وفق نظام Twin Peaks
journal, January 2017


Hybrid Core Acceleration of UWB SIRE Radar Signal Processing
journal, January 2011

  • Park, Song Jun; Ross, James; Shires, Dale
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 22, Issue 1
  • DOI: 10.1109/TPDS.2010.117

Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU
journal, October 2010

  • Shen, Wenfeng; Wei, Daming; Xu, Weimin
  • Computer Methods and Programs in Biomedicine, Vol. 100, Issue 1
  • DOI: 10.1016/j.cmpb.2010.06.015

Automatic generation of software pipelines for heterogeneous parallel systems
conference, November 2012

  • Pienaar, Jacques A.; Chakradhar, Srimat; Raghunathan, Anand
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.22

Performance characterization of data-intensive kernels on AMD Fusion architectures
journal, May 2012

  • Lee, Kenneth; Lin, Heshan; Feng, Wu-chun
  • Computer Science - Research and Development, Vol. 28, Issue 2-3
  • DOI: 10.1007/s00450-012-0209-1

Works referencing / citing this record:

Artificial intelligence: a survey on evolution, models, applications and future trends
journal, January 2019


Crossing the chasm: how to develop weather and climate models for next generation computers?
journal, January 2018

  • Lawrence, Bryan N.; Rezny, Michael; Budich, Reinhard
  • Geoscientific Model Development, Vol. 11, Issue 5
  • DOI: 10.5194/gmd-11-1799-2018

Task management on fully heterogeneous micro-server system: Modeling and resolution strategies: Task management on fully heterogeneous micro-server system: Modeling and resolution strategies
journal, September 2018

  • Zaourar, Lilia; Ait Aba, Massinissa; Briand, David
  • Concurrency and Computation: Practice and Experience, Vol. 30, Issue 23
  • DOI: 10.1002/cpe.4798

Optimizing parameter sensitivity analysis of large‐scale microscopy image analysis workflows with multilevel computation reuse
journal, June 2019

  • Barreiros, Willian; Moreira, Jeremias; Kurc, Tahsin
  • Concurrency and Computation: Practice and Experience, Vol. 32, Issue 2
  • DOI: 10.1002/cpe.5403

Energy‐aware task scheduling with time constraint for heterogeneous cloud datacenters
journal, July 2019

  • Liu, Xing; Liu, Panwen; Hu, Lun
  • Concurrency and Computation: Practice and Experience, Vol. 32, Issue 18
  • DOI: 10.1002/cpe.5437

FAST-FUSION: An Improved Accuracy Omnidirectional Visual Odometry System with Sensor Fusion and GPU Optimization for Embedded Low Cost Hardware
journal, December 2019

  • Aguiar, André; Santos, Filipe; Sousa, Armando Jorge
  • Applied Sciences, Vol. 9, Issue 24
  • DOI: 10.3390/app9245516

Dynamic Load Balancing Algorithm for Heterogeneous Clusters
book, March 2018

  • do Nascimento, Tiago Marques; dos Santos, Rodrigo Weber; Lobosco, Marcelo
  • Parallel Processing and Applied Mathematics
  • DOI: 10.1007/978-3-319-78054-2_16

Implementation of a non-linear solver on heterogeneous architectures: Implementation of a non-linear solver on heterogeneous architectures
journal, August 2018

  • Carracciuolo, Luisa; Lapegna, Marco
  • Concurrency and Computation: Practice and Experience, Vol. 30, Issue 24
  • DOI: 10.1002/cpe.4903

Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS)
journal, November 2019

  • Giorgi, Roberto; Khalili, Farnam; Procaccini, Marco
  • International Journal of Reconfigurable Computing, Vol. 2019
  • DOI: 10.1155/2019/2624938

Aspect-Oriented Set@l Language for Architecture-Independent Programming of High-Performance Computer Systems
book, January 2019

  • Levin, Ilya I.; Dordopulo, Alexey I.; Pisarenko, Ivan V.
  • Supercomputing: 5th Russian Supercomputing Days, RuSCDays 2019, Moscow, Russia, September 23–24, 2019, Revised Selected Papers, p. 517-528
  • DOI: 10.1007/978-3-030-36592-9_42

Efficient Execution of Smart City’s Assets Through a Massive Parallel Computational Model
book, July 2018

  • Ashraf, Muhammad Usman; Eassa, Fathy Alboraei; Albeshri, Aiiad Ahmad
  • Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
  • DOI: 10.1007/978-3-319-94180-6_6

A Heterogeneous Parallel LU Factorization Algorithm Based on a Basic Column Block Uniform Allocation Strategy
journal, February 2019

  • Wu, Rongteng; Xie, Xiaohong
  • Mathematical Problems in Engineering, Vol. 2019
  • DOI: 10.1155/2019/3720450

Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application
journal, May 2018


A survey of techniques for improving efficiency of mobile web browsing
journal, July 2018

  • Mittal, Sparsh; Mattela, Venkat
  • Concurrency and Computation: Practice and Experience, Vol. 31, Issue 15
  • DOI: 10.1002/cpe.5126

A Deep Pipelined Implementation of Hyperspectral Target Detection Algorithm on FPGA Using HLS
journal, March 2018

  • Lei, Jie; Li, Yunsong; Zhao, Dongsheng
  • Remote Sensing, Vol. 10, Issue 4
  • DOI: 10.3390/rs10040516

A Survey of Medical Imaging, Storage and Transfer Techniques
book, January 2019

  • Meenatchi Aparna, R. R.; Shanmugavadivu, P.
  • Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB)
  • DOI: 10.1007/978-3-030-00665-5_3

Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
journal, February 2019

  • Dávila Guzmán, María Angélica; Nozal, Raúl; Gran Tejero, Rubén
  • The Journal of Supercomputing, Vol. 75, Issue 3
  • DOI: 10.1007/s11227-019-02768-y

A survey of techniques for architecting TLBs: A survey of techniques for architecting translation lookaside buffers
journal, December 2016

  • Mittal, Sparsh
  • Concurrency and Computation: Practice and Experience, Vol. 29, Issue 10
  • DOI: 10.1002/cpe.4061

A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks
journal, April 2018

  • Mittal, Sparsh
  • Machine Learning and Knowledge Extraction, Vol. 1, Issue 1
  • DOI: 10.3390/make1010005

Page Locked GPGPU Rotational Visual Secret Sharing
book, January 2020

  • Raviraja Holla, M.; Suma, D.; Smys, S.
  • Second International Conference on Computer Networks and Communication Technologies: ICCNCT 2019, p. 349-359
  • DOI: 10.1007/978-3-030-37051-0_41

High-performance low-power approximate Wallace tree multiplier
journal, July 2018

  • Abed, Sa'ed; Khalil, Yasser; Modhaffar, Mahdi
  • International Journal of Circuit Theory and Applications, Vol. 46, Issue 12
  • DOI: 10.1002/cta.2540

A survey of FPGA-based accelerators for convolutional neural networks
journal, October 2018


Crossing the chasm: how to develop weather and climate models for next generation computers?
text, January 2018


Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review
journal, April 2018


GPU processing of theta-joins: GPU processing of theta-joins
journal, June 2017

  • Bellas, Christos; Gounaris, Anastasios
  • Concurrency and Computation: Practice and Experience, Vol. 29, Issue 18
  • DOI: 10.1002/cpe.4194

A survey of techniques for architecting SLC/MLC/TLC hybrid Flash memory-based SSDs: A survey of techniques for architecting hybrid flash memory based SSDs
journal, January 2018

  • Alsalibi, Ahmed Izzat; Mittal, Sparsh; Al-Betar, Mohammed Azmi
  • Concurrency and Computation: Practice and Experience, Vol. 30, Issue 13
  • DOI: 10.1002/cpe.4420

The Set@l Programming Language and Its Application for Coding Gaussian Elimination
book, August 2019

  • Levin, Ilya I.; Dordopulo, Aleksey I.; Pisarenko, Ivan V.
  • Parallel Computational Technologies: 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2–4, 2019, Revised Selected Papers, p. 45-57
  • DOI: 10.1007/978-3-030-28163-2_4