A survey of CPU-GPU heterogeneous computing techniques

Mittal, Sparsh; Vetter, Jeffrey S.

doi:10.1145/2788396

Title: A survey of CPU-GPU heterogeneous computing techniques

Abstract

As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.

Authors:

Mittal, Sparsh ^[1]; Vetter, Jeffrey S. ^[2]

Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Georgia Inst. of Technology, Atlanta, GA (United States)

Publication Date:: Sat Jul 04 00:00:00 EDT 2015

Research Org.:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Org.:: USDOE Office of Science (SC)

OSTI Identifier:: 1265534

Grant/Contract Number:: AC05-00OR22725

Resource Type:: Accepted Manuscript

Journal Name:: ACM Computing Surveys

Additional Journal Information:: Journal Volume: 47; Journal Issue: 4; Journal ID: ISSN 0360-0300

Publisher:: Association for Computing Machinery (ACM)

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; experimentation; management; measurement; performance; analysis; CPU-GPU heterogeneous/hybrid/collaborative computing; workload division/partitioning; dynamic/static load-balancing; pipelining; programming frameworks; fused CPU-GPU chip

Citation Formats


                    Mittal, Sparsh, and Vetter, Jeffrey S. A survey of CPU-GPU heterogeneous computing techniques.  United States: N. p., 2015. 
Web.  doi:10.1145/2788396.

Copy to clipboard


                    Mittal, Sparsh, & Vetter, Jeffrey S. A survey of CPU-GPU heterogeneous computing techniques.  United States.  https://doi.org/10.1145/2788396

Copy to clipboard


                    Mittal, Sparsh, and Vetter, Jeffrey S. Sat .  
"A survey of CPU-GPU heterogeneous computing techniques".  United States.  https://doi.org/10.1145/2788396.  https://www.osti.gov/servlets/purl/1265534.

Copy to clipboard


                    
@article{osti_1265534,

  title        = {A survey of CPU-GPU heterogeneous computing techniques},

  author       = {Mittal, Sparsh and Vetter, Jeffrey S.},

  abstractNote = {As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.},

  doi          = {10.1145/2788396},

  journal      = {ACM Computing Surveys},

  number       = 4,

  volume       = 47,

  place        = {United States},

  year         = {Sat Jul 04 00:00:00 EDT 2015},

  month        = {Sat Jul 04 00:00:00 EDT 2015}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1145/2788396

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 221 works

Citation information provided by
Web of Science

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Hybrid-parallel Algorithms for 2D Green's Functions
journal, January 2013

Álvarez-Melcón, Alejandro; Giménez, Domingo; Quesada, Fernando D.
Procedia Computer Science, Vol. 18
DOI: 10.1016/j.procs.2013.05.218

Programming model for a heterogeneous x86 platform
conference, January 2009

Saha, Bratin; Mendelson, Avi; Zhou, Xiaocheng
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09
DOI: 10.1145/1542476.1542525

Twin Peaks
journal, January 2017

Defever, Fabrice; Riano, Alejandro
SSRN Electronic Journal
DOI: 10.2139/ssrn.3099336

Porting irregular reductions on heterogeneous CPU-GPU configurations
conference, December 2011

Huo, Xin; Ravi, Vignesh T.; Agrawal, Gagan
2011 18th International Conference on High Performance Computing (HiPC)
DOI: 10.1109/HiPC.2011.6152715

Hybrid implementation of error diffusion dithering
conference, December 2011

Deshpande, Aditya; Misra, Ishan; Narayanan, P. J.
2011 18th International Conference on High Performance Computing (HiPC)
DOI: 10.1109/HiPC.2011.6152714

Programming model for a heterogeneous x86 platform
journal, May 2009

Saha, Bratin; Mendelson, Avi; Zhou, Xiaocheng
ACM SIGPLAN Notices, Vol. 44, Issue 6
DOI: 10.1145/1543135.1542525

Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids
conference, February 2012

Lee, Changmin; Ro, Won W.; Gaudiot, Jean-Luc
2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT)
DOI: 10.1109/INTERACT.2012.6339624

Discrete particle simulation of gas–solid two-phase flows with multi-scale CPU–GPU hybrid computation
journal, October 2012

Xu, Ming; Chen, Feiguo; Liu, Xinhua
Chemical Engineering Journal, Vol. 207-208
DOI: 10.1016/j.cej.2012.07.049

A new era in scientific computing: Domain decomposition methods in hybrid CPU–GPU architectures
journal, March 2011

Papadrakakis, M.; Stavroulakis, G.; Karatarakis, A.
Computer Methods in Applied Mechanics and Engineering, Vol. 200, Issue 13-16
DOI: 10.1016/j.cma.2011.01.013

Processing data streams with hard real-time constraints on heterogeneous systems
conference, January 2011

Verner, Uri; Schuster, Assaf; Silberstein, Mark
Proceedings of the international conference on Supercomputing - ICS '11
DOI: 10.1145/1995896.1995915

Axel: a heterogeneous cluster with FPGAs and GPUs
conference, January 2010

Tsoi, Kuen Hung; Luk, Wayne
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '10
DOI: 10.1145/1723112.1723134

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system
conference, January 2011

Li, Linchuan; Li, Xingjian; Tan, Guangming
Proceedings of the 20th international symposium on High performance distributed computing - HPDC '11
DOI: 10.1145/1996130.1996157

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
conference, September 2012

Ma, Kai; Li, Xue; Chen, Wei
2012 41st International Conference on Parallel Processing (ICPP)
DOI: 10.1109/ICPP.2012.31

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
conference, January 2009

Venkatasubramanian, Sundaresan; Vuduc, Richard W.; none, none
Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09
DOI: 10.1145/1542275.1542312

MapCG: writing parallel program portable between CPU and GPU
conference, January 2010

Hong, Chuntao; Chen, Dehao; Chen, Wenguang
Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
DOI: 10.1145/1854273.1854303

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors
conference, January 2010

Gummaraju, Jayanth; Morichetti, Laurent; Houston, Michael
Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
DOI: 10.1145/1854273.1854302

Efficient co-processor utilization in database query processing
journal, November 2013

Breß, Sebastian; Beier, Felix; Rauhe, Hannes
Information Systems, Vol. 38, Issue 8
DOI: 10.1016/j.is.2013.05.004

A yoke of oxen and a thousand chickens for heavy lifting graph processing
conference, January 2012

Gharaibeh, Abdullah; Beltrão Costa, Lauro; Santos-Neto, Elizeu
Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
DOI: 10.1145/2370816.2370866

Heterogeneous Computational Model for Landform Attributes Representation on Multicore and Multi-GPU Systems
journal, January 2012

Boratto, Murilo; Alonso, Pedro; Ramiro, Carla
Procedia Computer Science, Vol. 9
DOI: 10.1016/j.procs.2012.04.006

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL
book, January 2011

Grewe, Dominik; O’Boyle, Michael F. P.
Compiler Construction. Lecture Notes in Computer Science
DOI: 10.1007/978-3-642-19861-8_16

Harmony: an execution model and runtime for heterogeneous many core systems
conference, January 2008

Diamos, Gregory F.; Yalamanchili, Sudhakar
Proceedings of the 17th international symposium on High performance distributed computing - HPDC '08
DOI: 10.1145/1383422.1383447

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
conference, September 2010

Yang, Canqun; Wang, Feng; Du, Yunfei
2010 IEEE International Conference on Cluster Computing (CLUSTER)
DOI: 10.1109/CLUSTER.2010.12

A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms
journal, August 2011

Benner, Peter; Ezzatti, Pablo; Kressner, Daniel
Parallel Computing, Vol. 37, Issue 8
DOI: 10.1016/j.parco.2010.12.002

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
journal, August 2012

Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
ACM SIGPLAN Notices, Vol. 47, Issue 6
DOI: 10.1145/2345156.1993517

5.1 POWER8^TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth
conference, February 2014

Fluhr, Eric J.; Friedrich, Joshua; Dreps, Daniel
2014 IEEE International Solid- State Circuits Conference (ISSCC), 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)
DOI: 10.1109/ISSCC.2014.6757353

Accelerating Protein Sequence Search in a Heterogeneous Computing System
conference, May 2011

Xiao, Shucai; Lin, Heshan; Feng, Wu-chun
Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
DOI: 10.1109/IPDPS.2011.115

Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms
conference, April 2010

He, Zhengyu; Hong, Bo
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
DOI: 10.1109/IPDPS.2010.5470401

An efficient, model-based CPU-GPU heterogeneous FFT library
conference, April 2008

Ogata, Yasuhito; Endo, Toshio; Maruyama, Naoya
2008 IEEE International Symposium on Parallel and Distributed Processing
DOI: 10.1109/IPDPS.2008.4536163

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
conference, January 2010

Lee, Victor W.; Hammarlund, Per; Singhal, Ronak
Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10
DOI: 10.1145/1815961.1816021

Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment
book, January 2007

Ohshima, Satoshi; Kise, Kenji; Katagiri, Takahiro
High Performance Computing for Computational Science - VECPAR 2006. Lecture Notes in Computer Science
DOI: 10.1007/978-3-540-71351-7_24

Scalable fast multipole methods on distributed heterogeneous architectures
conference, January 2011

Hu, Qi; Gumerov, Nail A.; Duraiswami, Ramani
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
DOI: 10.1145/2063384.2063432

MDR: performance model driven runtime for heterogeneous parallel platforms
conference, January 2011

Pienaar, Jacques A.; Raghunathan, Anand; Chakradhar, Srimat
Proceedings of the international conference on Supercomputing - ICS '11
DOI: 10.1145/1995896.1995933

Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters
journal, June 2012

Lu, Fengshun; Song, Junqiang; Yin, Fukang
Computer Physics Communications, Vol. 183, Issue 6
DOI: 10.1016/j.cpc.2012.01.019

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
journal, June 2010

Lee, Victor W.; Hammarlund, Per; Singhal, Ronak
ACM SIGARCH Computer Architecture News, Vol. 38, Issue 3
DOI: 10.1145/1816038.1816021

Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
conference, November 2010

Hampton, Scott S.; Alam, Sadaf R.; Crozier, Paul S.
2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2010.37

A dynamic scheduling framework for emerging heterogeneous systems
conference, December 2011

Ravi, Vignesh T.; Agrawal, Gagan
2011 18th International Conference on High Performance Computing (HiPC)
DOI: 10.1109/HiPC.2011.6152724

Implementation of Fdtd-Compatible Green'S Function on Heterogeneous Cpu-Gpu Parallel Processing System
journal, January 2013

Stefanski, Tomasz P.
Progress In Electromagnetics Research, Vol. 135
DOI: 10.2528/PIER12111702

An asymmetric distributed shared memory model for heterogeneous parallel systems
journal, March 2010

Gelado, Isaac; Stone, John E.; Cabezas, Javier
ACM SIGPLAN Notices, Vol. 45, Issue 3
DOI: 10.1145/1735971.1736059

A New Parallel Method of Smith-Waterman Algorithm on a Heterogeneous Platform
book, January 2010

Chen, Bo; Xu, Yun; Yang, Jiaoyun
Algorithms and Architectures for Parallel Processing
DOI: 10.1007/978-3-642-13119-6_7

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
conference, November 2010

Shirahata, Koichi; Sato, Hitoshi; Matsuoka, Satoshi
2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on Cloud Computing Technology and Science
DOI: 10.1109/CloudCom.2010.55

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
journal, January 2013

Choi, Hong Jun; Son, Dong Oh; Kang, Seung Gu
The Journal of Supercomputing, Vol. 65, Issue 2
DOI: 10.1007/s11227-013-0870-6

Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures
journal, September 2011

Meredith, Jeremy; Roth, Philip; Spafford, Kyle
IEEE Micro, Vol. 31, Issue 5
DOI: 10.1109/MM.2011.79

An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures
conference, January 2011

Silberstein, Mark; Maruyama, Naoya
Proceedings of the 4th Annual International Conference on Systems and Storage - SYSTOR '11
DOI: 10.1145/1987816.1987826

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
conference, November 2010

Rahimian, Abtin; Lashuk, Ilya; Veerapaneni, Shravan
2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2010.42

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms
conference, September 2011

Binotto, Alecio P. D.; Pereira, Carlos E.; Kuijper, Arjan
Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications
DOI: 10.1109/HPCC.2011.20

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters
journal, March 2013

Li, Hung-Fu; Liang, Tyng-Yeu; Chiu, Jun-Yao
The Journal of Supercomputing, Vol. 66, Issue 1
DOI: 10.1007/s11227-013-0912-0

An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs
conference, October 2011

Balevic, Ana; Kienhuis, Bart
2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM)
DOI: 10.1109/DFM.2011.10

Heterogeneous Systems for Energy Efficient Scientific Computing
book, January 2012

Liu, Qiang; Luk, Wayne
Reconfigurable Computing: Architectures, Tools and Applications. Lecture Notes in Computer Science
DOI: 10.1007/978-3-642-28365-9_6

Fluid Simulation with Two-Way Interaction Rigid Body Using a Heterogeneous GPU and CPU Environment
conference, November 2010

Junior, José Ricardo da S.; Clua, Esteban W.; Montenegro, Anselmo
2010 Brazilian Symposium on Games and Digital Entertainment (SBGAMES)
DOI: 10.1109/SBGAMES.2010.25

Task-based parallel breadth-first search in heterogeneous environments
conference, December 2012

Munguia, Lluis-Miquel; Bader, David A.; Ayguade, Eduard
2012 19th International Conference on High Performance Computing (HiPC)
DOI: 10.1109/HiPC.2012.6507474

Power-aware dynamic task scheduling for heterogeneous accelerated clusters
conference, May 2009

Hamano, Tomoaki; Endo, Toshio; Matsuoka, Satoshi
2009 IEEE International Symposium on Parallel & Distributed Processing
DOI: 10.1109/IPDPS.2009.5160977

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms
conference, July 2011

Anzt, Hartwig; Heuveline, Vincent; Aliaga, Jose I.
2011 International Green Computing Conference (IGCC), 2011 International Green Computing Conference and Workshops
DOI: 10.1109/IGCC.2011.6008594

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
conference, January 2012

Spafford, Kyle L.; Meredith, Jeremy S.; Lee, Seyong
Proceedings of the 9th conference on Computing Frontiers - CF '12
DOI: 10.1145/2212908.2212924

A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters
conference, May 2011

Liu, Wenjie; Du, Zhihui; Xiao, Yu
Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
DOI: 10.1109/IPDPS.2011.129

A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
conference, May 2012

Tan, Yu Shyang; Lee, Bu-Sung; He, Bingsheng
2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
DOI: 10.1109/CCGrid.2012.35

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing
conference, September 2012

Odajima, Tetsuya; Boku, Taisuke; Hanawa, Toshihiro
2012 41st International Conference on Parallel Processing Workshops (ICPPW)
DOI: 10.1109/ICPPW.2012.16

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms
conference, September 2012

Albayrak, Omer Erdil; Akturk, Ismail; Ozturk, Ozcan
2012 41st International Conference on Parallel Processing Workshops (ICPPW)
DOI: 10.1109/ICPPW.2012.14

Iterative SLE Solvers over a CPU-GPU Platform
conference, September 2010

Binotto, Alécio P. D.; Daniel, Christian; Weber, Daniel
2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010), 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC)
DOI: 10.1109/HPCC.2010.40

Power-efficient time-sensitive mapping in heterogeneous systems
conference, January 2012

Liu, Cong; Li, Jian; Huang, Wei
Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
DOI: 10.1145/2370816.2370822

Predictive Runtime Code Scheduling for Heterogeneous Architectures
book, January 2009

Jiménez, Víctor J.; Vilanova, Lluís; Gelado, Isaac
High Performance Embedded Architectures and Compilers
DOI: 10.1007/978-3-540-92990-1_4

Fast Snippet Generation Based on CPU-GPU Hybrid System
conference, December 2011

Liu, Ding; Li, Ruixuan; Gu, Xiwu
2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS)
DOI: 10.1109/ICPADS.2011.63

AMD Fusion APU: Llano
journal, March 2012

Branover, Alexander; Foley, Denis; Steinman, Maurice
IEEE Micro, Vol. 32, Issue 2
DOI: 10.1109/MM.2012.2

Enabling task-level scheduling on heterogeneous platforms
conference, January 2012

Sun, Enqiang; Schaa, Dana; Bagley, Richard
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5
DOI: 10.1145/2159430.2159440

Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
conference, January 2013

Mistry, Perhaad; Ukidave, Yash; Schaa, Dana
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units - GPGPU-6
DOI: 10.1145/2458523.2458529

Biomedical image analysis on a cooperative cluster of GPUs and multicores
conference, January 2008

Hartley, Timothy D. R.; Catalyurek, Umit; Ruiz, Antonio
Proceedings of the 22nd annual international conference on Supercomputing - ICS '08
DOI: 10.1145/1375527.1375533

Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
conference, January 2014

Pandit, Prasanna; Govindarajan, R.
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization - CGO '14
DOI: 10.1145/2581122.2544163

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
conference, October 2011

Hong, Sungpack; Oguntebi, Tayo; Olukotun, Kunle
2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)
DOI: 10.1109/PACT.2011.14

An automatic input-sensitive approach for heterogeneous task partitioning
conference, January 2013

Kofler, Klaus; Grasso, Ivan; Cosenza, Biagio
Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13
DOI: 10.1145/2464996.2465007

Optimizing tensor contraction expressions for hybrid CPU-GPU execution
journal, November 2011

Ma, Wenjing; Krishnamoorthy, Sriram; Villa, Oreste
Cluster Computing, Vol. 16, Issue 1
DOI: 10.1007/s10586-011-0179-2

A fully integrated multi-CPU, GPU and memory controller 32nm processor
conference, February 2011

Yuffe, Marcelo; Knoll, Ernest; Mehalel, Moty
2011 IEEE International Solid- State Circuits Conference - (ISSCC), 2011 IEEE International Solid-State Circuits Conference
DOI: 10.1109/ISSCC.2011.5746311

MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
conference, May 2012

Jiang, Wei; Agrawal, Gagan
2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
DOI: 10.1109/IPDPS.2012.65

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
conference, May 2011

Agullo, Emmanuel; Augonnet, Cedric; Dongarra, Jack
Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
DOI: 10.1109/IPDPS.2011.90

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
journal, January 2011

Dziekonski, A.; Lamecki, A.; Mrozowski, M.
IEEE Antennas and Wireless Propagation Letters, Vol. 10
DOI: 10.1109/LAWP.2011.2159769

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
conference, July 2011

Daga, Mayank; Aji, Ashwin M.; Feng, Wu-chun
2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)
DOI: 10.1109/SAAHPC.2011.29

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
conference, September 2013

Lee, Janghaeng; Samadi, Mehrzad; Park, Yongjun
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques
DOI: 10.1109/PACT.2013.6618821

Mapping the sbr and Tw-Ildcs to Heterogeneous Cpu-Gpu Architecture for fast Computation of Electromagnetic Scattering
journal, January 2012

Gao, Peng Cheng; Tao, Yu Bo; Bai, Zhi Hui
Progress In Electromagnetics Research, Vol. 122
DOI: 10.2528/PIER11092303

CPU-GPU hybrid parallel strategy for cosmological simulations: CPU-GPU HBRID PARALLEL STRATEGY FOR COSMOLOGICAL SIMULATION
journal, May 2013

Wang, Yueqing; Dou, Yong; Guo, Song
Concurrency and Computation: Practice and Experience, Vol. 26, Issue 3
DOI: 10.1002/cpe.3046

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
conference, January 2012

Li, Jiajia; Li, Xingjian; Tan, Guangming
Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
DOI: 10.1145/2304576.2304626

A peta-scalable CPU-GPU algorithm for global atmospheric simulations
journal, August 2013

Yang, Chao; Zheng, Weimin; Xue, Wei
ACM SIGPLAN Notices, Vol. 48, Issue 8
DOI: 10.1145/2517327.2442518

Communication-Aware Task Partition and Voltage Scaling for Energy Minimization on Heterogeneous Parallel Systems
conference, October 2011

Wang, Guibin; Song, Wei
2011 12th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT), 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies
DOI: 10.1109/PDCAT.2011.28

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures
conference, July 2011

Horton, Mitch; Tomov, Stanimire; Dongarra, Jack
2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)
DOI: 10.1109/SAAHPC.2011.18

SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
conference, January 2012

Kim, Jungwon; Seo, Sangmin; Lee, Jun
Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
DOI: 10.1145/2304576.2304623

A survey of techniques for improving energy efficiency in embedded computing systems
journal, January 2014

Mittal, Sparsh
International Journal of Computer Aided Engineering and Technology, Vol. 6, Issue 4
DOI: 10.1504/IJCAET.2014.065419

A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
book, January 2009

Ayguade, Eduard; Badia, Rosa M.; Cabrera, Daniel
Evolving OpenMP in an Age of Extreme Parallelism
DOI: 10.1007/978-3-642-02303-3_13

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations
conference, January 2010

Ravi, Vignesh T.; Ma, Wenjing; Chiu, David
Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
DOI: 10.1145/1810085.1810106

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
conference, January 2011

Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11
DOI: 10.1145/1993498.1993517

Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems
conference, July 2008

Joselli, Mark; Zamith, Marcelo; Clua, Esteban
2008 IEEE 11th International Conference on Computational Science and Engineering (CSE), 2008 11th IEEE International Conference on Computational Science and Engineering
DOI: 10.1109/CSE.2008.38

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency
journal, August 2014

Mittal, Sparsh; Vetter, Jeffrey S.
ACM Computing Surveys, Vol. 47, Issue 2
DOI: 10.1145/2636342

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
journal, June 2011

Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
ACM SIGPLAN Notices, Vol. 46, Issue 6
DOI: 10.1145/1993316.1993517

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms
text, January 2011

Anzt, Hartwig; Heuveline, Vincent; Aliaga, José I.
Karlsruher Institut für Technologie (KIT)
DOI: 10.5445/ir/1000023438

Accelerating Smith-Waterman on Heterogeneous CPU-GPU Systems
conference, May 2011

Singh, Jaideep; Aruni, Ipseeta
2011 5th International Conference on Bioinformatics and Biomedical Engineering (iCBBE)
DOI: 10.1109/icbbe.2011.5780005

SPRAT: Runtime processor selection for energy-aware computing
conference, September 2008

Takizawa, Hiroyuki; Sato, Katuto; Kobayashi, Hiroaki
2008 IEEE International Conference on Cluster Computing (CLUSTER)
DOI: 10.1109/CLUSTR.2008.4663799

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
conference, January 2010

Becchi, Michela; Byna, Surendra; Cadambi, Srihari
Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures - SPAA '10
DOI: 10.1145/1810479.1810498

Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems
conference, September 2010

Siegel, Jakob; Villa, Oreste; Krishnamoorthy, Sriram
2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS)
DOI: 10.1109/CLUSTERWKSP.2010.5613109

Hybrid ray tracing and path tracing of Bezier surfaces using a mixed hierarchy
conference, January 2012

Nigam, Rohit; Narayanan, P. J.
Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP '12
DOI: 10.1145/2425333.2425368

Linpack evaluation on a supercomputer with heterogeneous accelerators
conference, April 2010

Endo, Toshio; Matsuoka, Satoshi; Nukada, Akira
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
DOI: 10.1109/IPDPS.2010.5470353

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
book, January 2011

Ltaief, Hatem; Tomov, Stanimire; Nath, Rajib
High Performance Computing for Computational Science – VECPAR 2010. Lecture Notes in Computer Science
DOI: 10.1007/978-3-642-19328-6_11

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems
book, January 2014

Scogland, Thomas R. W.; Feng, Wu-chun; Rountree, Barry
Supercomputing. Lecture Notes in Computer Science
DOI: 10.1007/978-3-319-07518-1_11

Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations
conference, September 2012

Wen, Mei; Su, Huayou; Wei, Wenjie
2012 IEEE International Conference on Cluster Computing (CLUSTER)
DOI: 10.1109/CLUSTER.2012.37

Shot boundary detection using Zernike moments in multi-GPU multi-CPU architectures
journal, September 2012

Toharia, Pablo; Robles, Oscar D.; Suárez, Ricardo
Journal of Parallel and Distributed Computing, Vol. 72, Issue 9
DOI: 10.1016/j.jpdc.2011.10.011

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing
journal, December 2010

Tomov, Stanimire; Nath, Rajib; Dongarra, Jack
Parallel Computing, Vol. 36, Issue 12
DOI: 10.1016/j.parco.2010.06.001

Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications
conference, September 2012

Zhong, Ziming; Rychkov, Vladimir; Lastovetsky, Alexey
2012 IEEE International Conference on Cluster Computing (CLUSTER)
DOI: 10.1109/CLUSTER.2012.34

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
journal, November 2010

Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond
Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2
DOI: 10.1002/cpe.1631

An asymmetric distributed shared memory model for heterogeneous parallel systems
journal, March 2010

Gelado, Isaac; Cabezas, Javier; Navarro, Nacho
ACM SIGARCH Computer Architecture News, Vol. 38, Issue 1
DOI: 10.1145/1735970.1736059

Coordinating the use of GPU and CPU for improving performance of compute intensive applications
conference, August 2009

Teodoro, George; Sachetto, Rafael; Sertel, Olcay
2009 IEEE International Conference on Cluster Computing and Workshops
DOI: 10.1109/CLUSTR.2009.5289193

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications
conference, May 2012

Banerjee, Dip Sankar; Bahl, Aman Kumar; Kothapalli, Kishore
2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
DOI: 10.1109/IPDPSW.2012.212

Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing
journal, March 2015

Vetter, Jeffrey S.; Mittal, Sparsh
Computing in Science & Engineering, Vol. 17, Issue 2
DOI: 10.1109/MCSE.2015.4

Multilevel summation of electrostatic potentials using graphics processing units
journal, March 2009

Hardy, David J.; Stone, John E.; Schulten, Klaus
Parallel Computing, Vol. 35, Issue 3
DOI: 10.1016/j.parco.2008.12.005

A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
journal, January 2013

Belviranli, Mehmet E.; Bhuyan, Laxmi N.; Gupta, Rajiv
ACM Transactions on Architecture and Code Optimization, Vol. 9, Issue 4
DOI: 10.1145/2400682.2400716

Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference
journal, March 2013

Chai, Jun; Su, Huayou; Wen, Mei
The Journal of Supercomputing, Vol. 66, Issue 1
DOI: 10.1007/s11227-013-0911-1

Heterogeneous Task Scheduling for Accelerated OpenMP
conference, May 2012

Scogland, Thomas R. W.; Rountree, Barry; Feng, Wu-chun
2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
DOI: 10.1109/IPDPS.2012.23

Evaluating application performance and energy consumption on hybrid CPU+GPU architecture
journal, June 2012

Padoin, Edson Luiz; Pilla, Laércio Lima; Boito, Francieli Zanon
Cluster Computing, Vol. 16, Issue 3
DOI: 10.1007/s10586-012-0219-6

GPU and APU computations of Finite Time Lyapunov Exponent fields
journal, March 2012

Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
Journal of Computational Physics, Vol. 231, Issue 5
DOI: 10.1016/j.jcp.2011.10.032

Portable performance on heterogeneous architectures
journal, April 2013

Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
ACM SIGPLAN Notices, Vol. 48, Issue 4
DOI: 10.1145/2499368.2451162

Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images
journal, January 2011

Lecron, Fabian; Mahmoudi, Sidi Ahmed; Benjelloun, Mohammed
International Journal of Biomedical Imaging, Vol. 2011
DOI: 10.1155/2011/640208

IBM POWER7+ design for higher frequency at fixed power
journal, November 2013

Zyuban, V.; Taylor, S. A.; Christensen, B.
IBM Journal of Research and Development, Vol. 57, Issue 6
DOI: 10.1147/JRD.2013.2279597

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
conference, January 2011

Shimokawabe, Takashi; Aoki, Takayuki; Takaki, Tomohiro
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
DOI: 10.1145/2063384.2063388

Dynamic load balancing on heterogeneous multicore/multiGPU systems
conference, June 2010

Acosta, Alejandro; Corujo, Robert; Blanco, Vicente
Simulation (HPCS), 2010 International Conference on High Performance Computing & Simulation
DOI: 10.1109/HPCS.2010.5547097

A Technique for Collision Detection and 3D Interaction Based on Parallel GPU and CPU Processing
conference, November 2011

Tsuda, Fernando; Nakamura, Ricardo
2011 Brazilian Symposium on Games and Digital Entertainment (SBGAMES)
DOI: 10.1109/SBGAMES.2011.20

GPU-enabled efficient executions of radiation calculations in climate modeling
conference, December 2013

Korwar, Sai Kiran; Vadhiyar, Sathish; Nanjundiah, Ravi S.
2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
DOI: 10.1109/HiPC.2013.6799141

Dynamic Distribution of Workload between CPU and GPU for a Parallel Conjugate Gradient Method in an Adaptive FEM
journal, January 2013

Lang, Jens; Rünger, Gudula
Procedia Computer Science, Vol. 18
DOI: 10.1016/j.procs.2013.05.193

Combinatorial Bidirectional Path-Tracing for Efficient Hybrid CPU/GPU Rendering
journal, April 2011

Pajot, Anthony; Barthe, Loïc; Paulin, Mathias
Computer Graphics Forum, Vol. 30, Issue 2
DOI: 10.1111/j.1467-8659.2011.01863.x

A hybrid shared memory heterogeneous execution platform for PCIe-based GPGPUs
conference, December 2013

Shukla, Sambit K.; Bhuyan, Laxmi N.
2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
DOI: 10.1109/HiPC.2013.6799140

Optimization Techniques for 3D-FWT on Systems with Manycore GPUs and Multicore CPUs
journal, January 2013

Bernabé, Gregorio; Cuenca, Javier; Giménez, Domingo
Procedia Computer Science, Vol. 18
DOI: 10.1016/j.procs.2013.05.195

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
journal, May 2010

Stone, John E.; Gohara, David; Shi, Guochun
Computing in Science & Engineering, Vol. 12, Issue 3, p. 66-73
DOI: 10.1109/MCSE.2010.69

A 22nm IA multi-CPU and GPU System-on-Chip
conference, February 2012

Damaraju, Satish; George, Varghese; Jahagirdar, Sanjeev
2012 IEEE International Solid- State Circuits Conference - (ISSCC), 2012 IEEE International Solid-State Circuits Conference
DOI: 10.1109/ISSCC.2012.6176876

Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems
conference, April 2012

Hetherington, Tayler H.; Rogers, Timothy G.; Hsu, Lisa
2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)
DOI: 10.1109/ISPASS.2012.6189209

Scaling Hierarchical N-body Simulations on GPU Clusters
conference, November 2010

Jetley, Pritish; Wesolowski, Lukasz; Gioachin, Filippo
2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2010.49

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
conference, April 2011

Gregg, Chris; Hazelwood, Kim
Software (ISPASS), (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE
DOI: 10.1109/ISPASS.2011.5762730

CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
conference, January 2013

Wang, Zhenning; Zheng, Long; Chen, Quan
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '13
DOI: 10.1145/2442992.2443004

Compiler and runtime support for enabling reduction computations on heterogeneous systems: REDUCTION COMPUTATIONS ON HETEROGENEOUS SYSTEMS
journal, October 2011

Ravi, Vignesh T.; Ma, Wenjing; Chiu, David
Concurrency and Computation: Practice and Experience, Vol. 24, Issue 5
DOI: 10.1002/cpe.1848

Automatic dataflow application tuning for heterogeneous systems
conference, December 2010

Hartley, Timothy D. R.; Saule, Erik; Catalyurek, Umit V.
2010 International Conference on High Performance Computing (HiPC)
DOI: 10.1109/HIPC.2010.5713173

A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer
conference, May 2012

Wu, Qiang; Yang, Canqun; Wang, Feng
2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
DOI: 10.1109/IPDPSW.2012.13

Synergistic execution of stream programs on multicores with accelerators
journal, June 2009

Udupa, Abhishek; Govindarajan, R.; Thazhuthaveetil, Matthew J.
ACM SIGPLAN Notices, Vol. 44, Issue 7
DOI: 10.1145/1543136.1542466

X-device query processing by bitwise distribution
conference, January 2012

Pirk, Holger; Sellam, Thibault; Manegold, Stefan
Proceedings of the Eighth International Workshop on Data Management on New Hardware - DaMoN '12
DOI: 10.1145/2236584.2236591

CPU/GPU computing for long-wave radiation physics on large GPU clusters
journal, April 2012

Lu, Fengshun; Song, Junqiang; Cao, Xiaoqun
Computers & Geosciences, Vol. 41
DOI: 10.1016/j.cageo.2011.08.007

Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation
book, January 2012

Muraraşu, Alin; Weidendorfer, Josef; Bode, Arndt
Euro-Par 2011: Parallel Processing Workshops. Lecture Notes in Computer Science
DOI: 10.1007/978-3-642-29740-3_39

Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
conference, May 2012

Teodoro, George; Kurc, Tahsin M.; Pan, Tony
2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
DOI: 10.1109/IPDPS.2012.101

Rodinia: A benchmark suite for heterogeneous computing
conference, October 2009

Che, Shuai; Boyer, Michael; Meng, Jiayuan
2009 IEEE International Symposium on Workload Characterization (IISWC)
DOI: 10.1109/IISWC.2009.5306797

Asymptotic peak Utilisation in Heterogeneous Parallel Cpu/Gpu Pipelines: a Decentralised Queue Monitoring Strategy
journal, May 2012

Garba, Michael T.; GonzÁLez–VÉLez, Horacio
Parallel Processing Letters, Vol. 22, Issue 02
DOI: 10.1142/S0129626412400087

Dynamically managed data for CPU-GPU architectures
conference, January 2012

Jablin, Thomas B.; Jablin, James A.; Prabhu, Prakash
Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12
DOI: 10.1145/2259016.2259038

Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
conference, January 2013

Shen, Jie; Varbanescu, Ana Lucia; Sips, Henk
Proceedings of the ACM International Conference on Computing Frontiers - CF '13
DOI: 10.1145/2482767.2482785

Medical Ultrasound Imaging: To GPU or Not to GPU?
journal, September 2011

So, Hayden; Chen, Junying; Yiu, Billy
IEEE Micro, Vol. 31, Issue 5
DOI: 10.1109/MM.2011.65

Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters
book, January 2012

Clarke, David; Ilic, Aleksandar; Lastovetsky, Alexey
Euro-Par 2012 Parallel Processing
DOI: 10.1007/978-3-642-32820-6_49

Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems
journal, January 2012

Vömel, Christof; Tomov, Stanimire; Dongarra, Jack
SIAM Journal on Scientific Computing, Vol. 34, Issue 2
DOI: 10.1137/100806783

The Scalable Heterogeneous Computing (SHOC) benchmark suite
conference, January 2010

Danalis, Anthony; Marin, Gabriel; McCurdy, Collin
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10
DOI: 10.1145/1735688.1735702

A survey of architectural techniques for DRAM power management
journal, January 2012

Mittal, Sparsh
International Journal of High Performance Systems Architecture, Vol. 4, Issue 2
DOI: 10.1504/IJHPSA.2012.050990

Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function
book, January 2010

Benner, Peter; Ezzatti, Pablo; Quintana-Ortí, Enrique S.
Lecture Notes in Computer Science
DOI: 10.1007/978-3-642-14122-5_17

Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU–GPU Systems
journal, March 2013

Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram
Journal of Chemical Theory and Computation, Vol. 9, Issue 4
DOI: 10.1021/ct301130u

Practical Time Bundle Adjustment for 3D Reconstruction on the GPU
book, January 2012

Choudhary, Siddharth; Gupta, Shubham; Narayanan, P. J.
Trends and Topics in Computer Vision
DOI: 10.1007/978-3-642-35740-4_33

Solving a kind of BVP for ODEs on heterogeneous CPU + CUDA-enabled GPU systems
conference, October 2010

Stpiczynski, Przemyslaw; Potiopa, Joanna
2010 International Multiconference on Computer Science and Information Technology (IMCSIT 2010), Proceedings of the International Multiconference on Computer Science and Information Technology
DOI: 10.1109/IMCSIT.2010.5680041

Load balancing in a changing world: dealing with heterogeneity and performance variability
conference, January 2013

Boyer, Michael; Skadron, Kevin; Che, Shuai
Proceedings of the ACM International Conference on Computing Frontiers - CF '13
DOI: 10.1145/2482767.2482794

Efficient irregular wavefront propagation algorithms on hybrid CPU–GPU machines
journal, April 2013

Teodoro, George; Pan, Tony; Kurc, Tahsin M.
Parallel Computing, Vol. 39, Issue 4-5
DOI: 10.1016/j.parco.2013.03.001

A Hybrid CPU-GPU Accelerated Framework for Fast Mapping of High-Resolution Human Brain Connectome
journal, May 2013

Wang, Yu; Du, Haixiao; Xia, Mingrui
PLoS ONE, Vol. 8, Issue 5
DOI: 10.1371/journal.pone.0062789

Hybrid algorithms for list ranking and graph connected components
conference, December 2011

Banerjee, Dip Sankar; Kothapalli, Kishore
2011 18th International Conference on High Performance Computing (HiPC)
DOI: 10.1109/HiPC.2011.6152655

Maestro: Data Orchestration and Tuning for OpenCL Devices
book, January 2010

Spafford, Kyle; Meredith, Jeremy; Vetter, Jeffrey
Euro-Par 2010 - Parallel Processing
DOI: 10.1007/978-3-642-15291-7_26

Using graphics processors for high performance IR query processing
conference, January 2009

Ding, Shuai; He, Jinru; Yan, Hao
Proceedings of the 18th international conference on World wide web - WWW '09
DOI: 10.1145/1526709.1526766

Enhancing Cloud-Based Servers by GPU/CPU Virtualization Management
book, January 2013

Wu, Tin-Yu; Lee, Wei-Tsong; Duan, Chien-Yu
Advances in Intelligent Systems and Applications. Smart Innovation, Systems and Technologies
DOI: 10.1007/978-3-642-35473-1_20

A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs
journal, June 2014

Mittal, Sparsh
Journal of Circuits, Systems and Computers, Vol. 23, Issue 08
DOI: 10.1142/S0218126614300025

Accelerating Kirchhoff Migration by CPU and GPU Cooperation
conference, October 2009

Panetta, J.; Teixeira, T.; de Souza Filho, P. R. P.
2009 21st International Symposium on Computer Architecture and High Performance Computing. SBAC-PAD 2009
DOI: 10.1109/SBAC-PAD.2009.29

DESTINY: A Tool for Modeling Emerging 3D NVM and eDRAM caches
conference, January 2015

Poremba, Matt; Mittal, Sparsh; Li, Dong
Design, Automation and Test in Europe, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
DOI: 10.7873/DATE.2015.0733

Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems
conference, December 2013

Su, Yu; Ye, Ding; Xue, Jingling
2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
DOI: 10.1109/HiPC.2013.6799110

Real-Time Non-rigid Registration of Medical Images on a Cooperative Parallel Architecture
conference, November 2009

Liu, Yixun; Fedorov, Andriy; Kikinis, Ron
2009 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
DOI: 10.1109/BIBM.2009.10

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
conference, January 2009

Luk, Chi-Keung; Hong, Sunpyo; Kim, Hyesoon
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42
DOI: 10.1145/1669112.1669121

A peta-scalable CPU-GPU algorithm for global atmospheric simulations
conference, January 2013

Yang, Chao; Zheng, Weimin; Xue, Wei
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13
DOI: 10.1145/2442516.2442518

Accelerating MapReduce on a coupled CPU-GPU architecture
conference, November 2012

Chen, Linchuan; Huo, Xin; Agrawal, Gagan
2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2012.16

Portable performance on heterogeneous architectures
conference, January 2013

Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13
DOI: 10.1145/2451116.2451162

Quantifying the energy efficiency of FFT on heterogeneous platforms
conference, April 2013

Ukidave, Yash; Ziabari, Amir Kavyan; Mistry, Perhaad
2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
DOI: 10.1109/ISPASS.2013.6557174

Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system
conference, January 2012

Humphrey, Alan; Meng, Qingyu; Berzins, Martin
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment on Bridging from the eXtreme to the campus and beyond - XSEDE '12
DOI: 10.1145/2335755.2335791

Biomedical image analysis on a cooperative cluster of GPUs and multicores
conference, January 2014

Hartley, Timothy D. R.; Catalyurek, Umit; Ruiz, Antonio
25th Anniversary International Conference on Supercomputing Anniversary Volume -
DOI: 10.1145/2591635.2667189

Acceleration of Hessenberg Reduction for Nonsymmetric Eigenvalue Problems in a Hybrid CPU-GPU Computing Environment
journal, January 2011

Muramatsu, Jun-ichi; Fukaya, Takeshi; Zhang, Shao-Liang
International Journal of Networking and Computing, Vol. 1, Issue 2
DOI: 10.15803/ijnc.1.2_132

Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction
journal, April 2012

Agulleiro, J. I.; Vázquez, F.; Garzón, E. M.
Ultramicroscopy, Vol. 115
DOI: 10.1016/j.ultramic.2012.02.003

A survey of architectural techniques for improving cache power efficiency
journal, March 2014

Mittal, Sparsh
Sustainable Computing: Informatics and Systems, Vol. 4, Issue 1
DOI: 10.1016/j.suscom.2013.11.001

أنظمة الرقابية المالية العربية وإعادة هيكلتها وفق نظام Twin Peaks
journal, January 2017

أحمد, مداني
مجلة إقتصاديات شمال إفريقيا
DOI: 10.33858/0470-000-017-017

Hybrid Core Acceleration of UWB SIRE Radar Signal Processing
journal, January 2011

Park, Song Jun; Ross, James; Shires, Dale
IEEE Transactions on Parallel and Distributed Systems, Vol. 22, Issue 1
DOI: 10.1109/TPDS.2010.117

Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU
journal, October 2010

Shen, Wenfeng; Wei, Daming; Xu, Weimin
Computer Methods and Programs in Biomedicine, Vol. 100, Issue 1
DOI: 10.1016/j.cmpb.2010.06.015

Automatic generation of software pipelines for heterogeneous parallel systems
conference, November 2012

Pienaar, Jacques A.; Chakradhar, Srimat; Raghunathan, Anand
2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2012.22

Performance characterization of data-intensive kernels on AMD Fusion architectures
journal, May 2012

Lee, Kenneth; Lin, Heshan; Feng, Wu-chun
Computer Science - Research and Development, Vol. 28, Issue 2-3
DOI: 10.1007/s00450-012-0209-1

Works referencing / citing this record:

Artificial intelligence: a survey on evolution, models, applications and future trends
journal, January 2019

Lu, Yang
Journal of Management Analytics, Vol. 6, Issue 1
DOI: 10.1080/23270012.2019.1570365

Crossing the chasm: how to develop weather and climate models for next generation computers?
journal, January 2018

Lawrence, Bryan N.; Rezny, Michael; Budich, Reinhard
Geoscientific Model Development, Vol. 11, Issue 5
DOI: 10.5194/gmd-11-1799-2018

Task management on fully heterogeneous micro-server system: Modeling and resolution strategies: Task management on fully heterogeneous micro-server system: Modeling and resolution strategies
journal, September 2018

Zaourar, Lilia; Ait Aba, Massinissa; Briand, David
Concurrency and Computation: Practice and Experience, Vol. 30, Issue 23
DOI: 10.1002/cpe.4798

Optimizing parameter sensitivity analysis of large‐scale microscopy image analysis workflows with multilevel computation reuse
journal, June 2019

Barreiros, Willian; Moreira, Jeremias; Kurc, Tahsin
Concurrency and Computation: Practice and Experience, Vol. 32, Issue 2
DOI: 10.1002/cpe.5403

Energy‐aware task scheduling with time constraint for heterogeneous cloud datacenters
journal, July 2019

Liu, Xing; Liu, Panwen; Hu, Lun
Concurrency and Computation: Practice and Experience, Vol. 32, Issue 18
DOI: 10.1002/cpe.5437

FAST-FUSION: An Improved Accuracy Omnidirectional Visual Odometry System with Sensor Fusion and GPU Optimization for Embedded Low Cost Hardware
journal, December 2019

Aguiar, André; Santos, Filipe; Sousa, Armando Jorge
Applied Sciences, Vol. 9, Issue 24
DOI: 10.3390/app9245516

Dynamic Load Balancing Algorithm for Heterogeneous Clusters
book, March 2018

do Nascimento, Tiago Marques; dos Santos, Rodrigo Weber; Lobosco, Marcelo
Parallel Processing and Applied Mathematics
DOI: 10.1007/978-3-319-78054-2_16

Implementation of a non-linear solver on heterogeneous architectures: Implementation of a non-linear solver on heterogeneous architectures
journal, August 2018

Carracciuolo, Luisa; Lapegna, Marco
Concurrency and Computation: Practice and Experience, Vol. 30, Issue 24
DOI: 10.1002/cpe.4903

Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS)
journal, November 2019

Giorgi, Roberto; Khalili, Farnam; Procaccini, Marco
International Journal of Reconfigurable Computing, Vol. 2019
DOI: 10.1155/2019/2624938

Aspect-Oriented Set@l Language for Architecture-Independent Programming of High-Performance Computer Systems
book, January 2019

Levin, Ilya I.; Dordopulo, Alexey I.; Pisarenko, Ivan V.
Supercomputing: 5th Russian Supercomputing Days, RuSCDays 2019, Moscow, Russia, September 23–24, 2019, Revised Selected Papers, p. 517-528
DOI: 10.1007/978-3-030-36592-9_42

Efficient Execution of Smart City’s Assets Through a Massive Parallel Computational Model
book, July 2018

Ashraf, Muhammad Usman; Eassa, Fathy Alboraei; Albeshri, Aiiad Ahmad
Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
DOI: 10.1007/978-3-319-94180-6_6

A Heterogeneous Parallel LU Factorization Algorithm Based on a Basic Column Block Uniform Allocation Strategy
journal, February 2019

Wu, Rongteng; Xie, Xiaohong
Mathematical Problems in Engineering, Vol. 2019
DOI: 10.1155/2019/3720450

Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
journal, February 2019

Dendaluce Jahnke, Martin; Cosco, Francesco; Novickis, Rihards
Electronics, Vol. 8, Issue 2
DOI: 10.3390/electronics8020250

Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application
journal, May 2018

Mohebbi, Hamidreza
International Journal of Parallel Programming, Vol. 47, Issue 1
DOI: 10.1007/s10766-018-0574-x

A survey of techniques for improving efficiency of mobile web browsing
journal, July 2018

Mittal, Sparsh; Mattela, Venkat
Concurrency and Computation: Practice and Experience, Vol. 31, Issue 15
DOI: 10.1002/cpe.5126

A Deep Pipelined Implementation of Hyperspectral Target Detection Algorithm on FPGA Using HLS
journal, March 2018

Lei, Jie; Li, Yunsong; Zhao, Dongsheng
Remote Sensing, Vol. 10, Issue 4
DOI: 10.3390/rs10040516

A Survey of Medical Imaging, Storage and Transfer Techniques
book, January 2019

Meenatchi Aparna, R. R.; Shanmugavadivu, P.
Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB)
DOI: 10.1007/978-3-030-00665-5_3

Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
journal, February 2019

Dávila Guzmán, María Angélica; Nozal, Raúl; Gran Tejero, Rubén
The Journal of Supercomputing, Vol. 75, Issue 3
DOI: 10.1007/s11227-019-02768-y

A survey of techniques for architecting TLBs: A survey of techniques for architecting translation lookaside buffers
journal, December 2016

Mittal, Sparsh
Concurrency and Computation: Practice and Experience, Vol. 29, Issue 10
DOI: 10.1002/cpe.4061

A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks
journal, April 2018

Mittal, Sparsh
Machine Learning and Knowledge Extraction, Vol. 1, Issue 1
DOI: 10.3390/make1010005

Page Locked GPGPU Rotational Visual Secret Sharing
book, January 2020

Raviraja Holla, M.; Suma, D.; Smys, S.
Second International Conference on Computer Networks and Communication Technologies: ICCNCT 2019, p. 349-359
DOI: 10.1007/978-3-030-37051-0_41

High-performance low-power approximate Wallace tree multiplier
journal, July 2018

Abed, Sa'ed; Khalil, Yasser; Modhaffar, Mahdi
International Journal of Circuit Theory and Applications, Vol. 46, Issue 12
DOI: 10.1002/cta.2540

A survey of FPGA-based accelerators for convolutional neural networks
journal, October 2018

Mittal, Sparsh
Neural Computing and Applications, Vol. 32, Issue 4
DOI: 10.1007/s00521-018-3761-1

Crossing the chasm: how to develop weather and climate models for next generation computers?
text, January 2018

N., Lawrence, Bryan; Michael, Rezny,; Reinhard, Budich,
ETH Zurich
DOI: 10.3929/ethz-b-000265172

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review
journal, April 2018

Memeti, Suejb; Pllana, Sabri; Binotto, Alécio
Computing, Vol. 101, Issue 8
DOI: 10.1007/s00607-018-0614-9

GPU processing of theta-joins: GPU processing of theta-joins
journal, June 2017

Bellas, Christos; Gounaris, Anastasios
Concurrency and Computation: Practice and Experience, Vol. 29, Issue 18
DOI: 10.1002/cpe.4194

A survey of techniques for architecting SLC/MLC/TLC hybrid Flash memory-based SSDs: A survey of techniques for architecting hybrid flash memory based SSDs
journal, January 2018

Alsalibi, Ahmed Izzat; Mittal, Sparsh; Al-Betar, Mohammed Azmi
Concurrency and Computation: Practice and Experience, Vol. 30, Issue 13
DOI: 10.1002/cpe.4420

The Set@l Programming Language and Its Application for Coding Gaussian Elimination
book, August 2019

Levin, Ilya I.; Dordopulo, Aleksey I.; Pisarenko, Ivan V.
Parallel Computational Technologies: 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2–4, 2019, Revised Selected Papers, p. 45-57
DOI: 10.1007/978-3-030-28163-2_4

Similar Records in DOE PAGES and OSTI.GOV collections:

Panda: A Compiler Framework for Concurrent CPU $$+$$ GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Journal Article Sourouri, Mohammed ; Baden, Scott B. ; Cai, Xing - International Journal of Parallel Programming

We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPImore »« less
Cited by 14
https://doi.org/10.1007/s10766-016-0454-1

Full Text Available
Population Count on Intel® CPU, GPU, and FPGA

Conference Jin, Zheming ; Finkel, Hal

Population count is a primitive used in many applications. Commodity processors have dedicated instructions for achieving high-performance population count. Motivated by the productivity of high-level synthesis and the importance of population count, in this paper we investigated the OpenCL implementations of population count algorithms, and evaluated their performance and resource utilizations on an FPGA. Based on the results, we select the most efficient implementation. Then we derived a reduction pattern from a representative application of population count. We parallelized the reduction with atomic functions, and optimized it with vectorized memory accesses, tree reduction, and compute-unit duplication. We evaluated the performancemore »« less
https://doi.org/10.1109/IPDPSW50202.2020.00081
Scalable molecular dynamics on CPU and GPU architectures with NAMD

Journal Article Phillips, James C ; Hardy, David J ; Maia, Julio D.C. ; ... - Journal of Chemical Physics

NAMD is a molecular dynamics program designed for high-performance simulations of very large biological objects on CPU- and GPU-based architectures. NAMD offers scalable performance on petascale parallel supercomputers consisting of hundreds of thousands of cores, as well as on inexpensive commodity clusters commonly found in academic environments. It is written in C++ and leans on Charm++ parallel objects for optimal performance on low-latency architectures. NAMD is a versatile, multipurpose code that gathers state-of-the-art algorithms to carry out simulations in apt thermodynamic ensembles, using the widely popular CHARMM, AMBER, OPLS, and GROMOS biomolecular force fields. Here, we review the main featuresmore »« less
https://doi.org/10.1063/5.0014475
Indicator-directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems

Conference Zou, Pengfei ; Li, Ang ; Barker, Kevin J. ; ...

Modern high-performance and warehouse computing centers show strong interest in minimizing system power consumption while satisfying customers’ quality of service (QoS). Dynamic voltage and frequency scaling (DVFS) is effective for achieving this goal. Nevertheless, automating the process online and making it transparent to users must address three major challenges: (1) Complexity — today’s hardware components (e.g., CPUs, GPUs, memory, network, etc.) can be configured in several or dozens of frequency/voltage states for satisfying divergent system demands. Given their combination and the emergence of heterogeneity, searching the optimal configuration in the design space online can be timing consuming. (2) QoS guaranteemore »« less
https://doi.org/10.1109/CCGrid49817.2020.00-37
Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

Journal Article Xu, Chuanfu ; Deng, Xiaogang ; Zhang, Lilun ; ... - Journal of Computational Physics

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations formore »« less
https://doi.org/10.1016/J.JCP.2014.08.024

Similar Records

Title: A survey of CPU-GPU heterogeneous computing techniques

Abstract

Citation Formats

Hybrid-parallel Algorithms for 2D Green's Functions journal, January 2013

Programming model for a heterogeneous x86 platform conference, January 2009

Twin Peaks journal, January 2017

Porting irregular reductions on heterogeneous CPU-GPU configurations conference, December 2011

Hybrid implementation of error diffusion dithering conference, December 2011

Programming model for a heterogeneous x86 platform journal, May 2009

Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids conference, February 2012

Discrete particle simulation of gas–solid two-phase flows with multi-scale CPU–GPU hybrid computation journal, October 2012

A new era in scientific computing: Domain decomposition methods in hybrid CPU–GPU architectures journal, March 2011

Processing data streams with hard real-time constraints on heterogeneous systems conference, January 2011

Axel: a heterogeneous cluster with FPGAs and GPUs conference, January 2010

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system conference, January 2011

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures conference, September 2012

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems conference, January 2009

MapCG: writing parallel program portable between CPU and GPU conference, January 2010

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors conference, January 2010

Efficient co-processor utilization in database query processing journal, November 2013

A yoke of oxen and a thousand chickens for heavy lifting graph processing conference, January 2012

Heterogeneous Computational Model for Landform Attributes Representation on Multicore and Multi-GPU Systems journal, January 2012

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL book, January 2011

Harmony: an execution model and runtime for heterogeneous many core systems conference, January 2008

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing conference, September 2010

A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms journal, August 2011

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors journal, August 2012

5.1 POWER8TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth conference, February 2014

Accelerating Protein Sequence Search in a Heterogeneous Computing System conference, May 2011

Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms conference, April 2010

An efficient, model-based CPU-GPU heterogeneous FFT library conference, April 2008

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU conference, January 2010

Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment book, January 2007

Scalable fast multipole methods on distributed heterogeneous architectures conference, January 2011

MDR: performance model driven runtime for heterogeneous parallel platforms conference, January 2011

Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters journal, June 2012

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU journal, June 2010

Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations conference, November 2010

A dynamic scheduling framework for emerging heterogeneous systems conference, December 2011

Implementation of Fdtd-Compatible Green'S Function on Heterogeneous Cpu-Gpu Parallel Processing System journal, January 2013

An asymmetric distributed shared memory model for heterogeneous parallel systems journal, March 2010

A New Parallel Method of Smith-Waterman Algorithm on a Heterogeneous Platform book, January 2010

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters conference, November 2010

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems journal, January 2013

Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures journal, September 2011

An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures conference, January 2011

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures conference, November 2010

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms conference, September 2011

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters journal, March 2013

An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs conference, October 2011

Heterogeneous Systems for Energy Efficient Scientific Computing book, January 2012

Fluid Simulation with Two-Way Interaction Rigid Body Using a Heterogeneous GPU and CPU Environment conference, November 2010

Task-based parallel breadth-first search in heterogeneous environments conference, December 2012

Power-aware dynamic task scheduling for heterogeneous accelerated clusters conference, May 2009

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms conference, July 2011

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures conference, January 2012

A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters conference, May 2011

A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments conference, May 2012

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing conference, September 2012

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms conference, September 2012

Iterative SLE Solvers over a CPU-GPU Platform conference, September 2010

Power-efficient time-sensitive mapping in heterogeneous systems conference, January 2012

Predictive Runtime Code Scheduling for Heterogeneous Architectures book, January 2009

Fast Snippet Generation Based on CPU-GPU Hybrid System conference, December 2011

AMD Fusion APU: Llano journal, March 2012

Enabling task-level scheduling on heterogeneous platforms conference, January 2012

Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems conference, January 2013

Biomedical image analysis on a cooperative cluster of GPUs and multicores conference, January 2008

Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices conference, January 2014

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU conference, October 2011

An automatic input-sensitive approach for heterogeneous task partitioning conference, January 2013

Optimizing tensor contraction expressions for hybrid CPU-GPU execution journal, November 2011

A fully integrated multi-CPU, GPU and memory controller 32nm processor conference, February 2011

MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters conference, May 2012

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators conference, May 2011

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations journal, January 2011

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing conference, July 2011

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems conference, September 2013

Mapping the sbr and Tw-Ildcs to Heterogeneous Cpu-Gpu Architecture for fast Computation of Electromagnetic Scattering journal, January 2012

CPU-GPU hybrid parallel strategy for cosmological simulations: CPU-GPU HBRID PARALLEL STRATEGY FOR COSMOLOGICAL SIMULATION journal, May 2013

Hybrid-parallel Algorithms for 2D Green's Functions
journal, January 2013

Programming model for a heterogeneous x86 platform
conference, January 2009

Twin Peaks
journal, January 2017

Porting irregular reductions on heterogeneous CPU-GPU configurations
conference, December 2011

Hybrid implementation of error diffusion dithering
conference, December 2011

Programming model for a heterogeneous x86 platform
journal, May 2009

Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids
conference, February 2012

Discrete particle simulation of gas–solid two-phase flows with multi-scale CPU–GPU hybrid computation
journal, October 2012

A new era in scientific computing: Domain decomposition methods in hybrid CPU–GPU architectures
journal, March 2011

Processing data streams with hard real-time constraints on heterogeneous systems
conference, January 2011

Axel: a heterogeneous cluster with FPGAs and GPUs
conference, January 2010

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system
conference, January 2011

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
conference, September 2012

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
conference, January 2009

MapCG: writing parallel program portable between CPU and GPU
conference, January 2010

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors
conference, January 2010

Efficient co-processor utilization in database query processing
journal, November 2013

A yoke of oxen and a thousand chickens for heavy lifting graph processing
conference, January 2012

Heterogeneous Computational Model for Landform Attributes Representation on Multicore and Multi-GPU Systems
journal, January 2012

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL
book, January 2011

Harmony: an execution model and runtime for heterogeneous many core systems
conference, January 2008

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
conference, September 2010

A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms
journal, August 2011

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
journal, August 2012

5.1 POWER8^TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth
conference, February 2014

Accelerating Protein Sequence Search in a Heterogeneous Computing System
conference, May 2011

Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms
conference, April 2010

An efficient, model-based CPU-GPU heterogeneous FFT library
conference, April 2008

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
conference, January 2010

Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment
book, January 2007

Scalable fast multipole methods on distributed heterogeneous architectures
conference, January 2011

MDR: performance model driven runtime for heterogeneous parallel platforms
conference, January 2011

Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters
journal, June 2012

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
journal, June 2010

Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
conference, November 2010

A dynamic scheduling framework for emerging heterogeneous systems
conference, December 2011

Implementation of Fdtd-Compatible Green'S Function on Heterogeneous Cpu-Gpu Parallel Processing System
journal, January 2013

An asymmetric distributed shared memory model for heterogeneous parallel systems
journal, March 2010

A New Parallel Method of Smith-Waterman Algorithm on a Heterogeneous Platform
book, January 2010

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
conference, November 2010

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
journal, January 2013

Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures
journal, September 2011

An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures
conference, January 2011

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
conference, November 2010

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms
conference, September 2011

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters
journal, March 2013

An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs
conference, October 2011

Heterogeneous Systems for Energy Efficient Scientific Computing
book, January 2012

Fluid Simulation with Two-Way Interaction Rigid Body Using a Heterogeneous GPU and CPU Environment
conference, November 2010

Task-based parallel breadth-first search in heterogeneous environments
conference, December 2012

Power-aware dynamic task scheduling for heterogeneous accelerated clusters
conference, May 2009

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms
conference, July 2011

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
conference, January 2012

A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters
conference, May 2011

A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
conference, May 2012

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing
conference, September 2012

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms
conference, September 2012

Iterative SLE Solvers over a CPU-GPU Platform
conference, September 2010

Power-efficient time-sensitive mapping in heterogeneous systems
conference, January 2012

Predictive Runtime Code Scheduling for Heterogeneous Architectures
book, January 2009

Fast Snippet Generation Based on CPU-GPU Hybrid System
conference, December 2011

AMD Fusion APU: Llano
journal, March 2012

Enabling task-level scheduling on heterogeneous platforms
conference, January 2012

Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
conference, January 2013

Biomedical image analysis on a cooperative cluster of GPUs and multicores
conference, January 2008

Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
conference, January 2014

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
conference, October 2011

An automatic input-sensitive approach for heterogeneous task partitioning
conference, January 2013

Optimizing tensor contraction expressions for hybrid CPU-GPU execution
journal, November 2011

A fully integrated multi-CPU, GPU and memory controller 32nm processor
conference, February 2011

MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
conference, May 2012

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
conference, May 2011

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
journal, January 2011

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
conference, July 2011

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
conference, September 2013

Mapping the sbr and Tw-Ildcs to Heterogeneous Cpu-Gpu Architecture for fast Computation of Electromagnetic Scattering
journal, January 2012

CPU-GPU hybrid parallel strategy for cosmological simulations: CPU-GPU HBRID PARALLEL STRATEGY FOR COSMOLOGICAL SIMULATION
journal, May 2013

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
conference, January 2012

A peta-scalable CPU-GPU algorithm for global atmospheric simulations
journal, August 2013

Communication-Aware Task Partition and Voltage Scaling for Energy Minimization on Heterogeneous Parallel Systems
conference, October 2011