A survey of CPU-GPU heterogeneous computing techniques
Abstract
As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Georgia Inst. of Technology, Atlanta, GA (United States)
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1265534
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- ACM Computing Surveys
- Additional Journal Information:
- Journal Volume: 47; Journal Issue: 4; Journal ID: ISSN 0360-0300
- Publisher:
- Association for Computing Machinery (ACM)
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; experimentation; management; measurement; performance; analysis; CPU-GPU heterogeneous/hybrid/collaborative computing; workload division/partitioning; dynamic/static load-balancing; pipelining; programming frameworks; fused CPU-GPU chip
Citation Formats
Mittal, Sparsh, and Vetter, Jeffrey S. A survey of CPU-GPU heterogeneous computing techniques. United States: N. p., 2015.
Web. doi:10.1145/2788396.
Mittal, Sparsh, & Vetter, Jeffrey S. A survey of CPU-GPU heterogeneous computing techniques. United States. https://doi.org/10.1145/2788396
Mittal, Sparsh, and Vetter, Jeffrey S. Sat .
"A survey of CPU-GPU heterogeneous computing techniques". United States. https://doi.org/10.1145/2788396. https://www.osti.gov/servlets/purl/1265534.
@article{osti_1265534,
title = {A survey of CPU-GPU heterogeneous computing techniques},
author = {Mittal, Sparsh and Vetter, Jeffrey S.},
abstractNote = {As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.},
doi = {10.1145/2788396},
journal = {ACM Computing Surveys},
number = 4,
volume = 47,
place = {United States},
year = {Sat Jul 04 00:00:00 EDT 2015},
month = {Sat Jul 04 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
Hybrid-parallel Algorithms for 2D Green's Functions
journal, January 2013
- Álvarez-Melcón, Alejandro; Giménez, Domingo; Quesada, Fernando D.
- Procedia Computer Science, Vol. 18
Programming model for a heterogeneous x86 platform
conference, January 2009
- Saha, Bratin; Mendelson, Avi; Zhou, Xiaocheng
- Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09
Porting irregular reductions on heterogeneous CPU-GPU configurations
conference, December 2011
- Huo, Xin; Ravi, Vignesh T.; Agrawal, Gagan
- 2011 18th International Conference on High Performance Computing (HiPC)
Hybrid implementation of error diffusion dithering
conference, December 2011
- Deshpande, Aditya; Misra, Ishan; Narayanan, P. J.
- 2011 18th International Conference on High Performance Computing (HiPC)
Programming model for a heterogeneous x86 platform
journal, May 2009
- Saha, Bratin; Mendelson, Avi; Zhou, Xiaocheng
- ACM SIGPLAN Notices, Vol. 44, Issue 6
Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids
conference, February 2012
- Lee, Changmin; Ro, Won W.; Gaudiot, Jean-Luc
- 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT)
Discrete particle simulation of gas–solid two-phase flows with multi-scale CPU–GPU hybrid computation
journal, October 2012
- Xu, Ming; Chen, Feiguo; Liu, Xinhua
- Chemical Engineering Journal, Vol. 207-208
A new era in scientific computing: Domain decomposition methods in hybrid CPU–GPU architectures
journal, March 2011
- Papadrakakis, M.; Stavroulakis, G.; Karatarakis, A.
- Computer Methods in Applied Mechanics and Engineering, Vol. 200, Issue 13-16
Processing data streams with hard real-time constraints on heterogeneous systems
conference, January 2011
- Verner, Uri; Schuster, Assaf; Silberstein, Mark
- Proceedings of the international conference on Supercomputing - ICS '11
Axel: a heterogeneous cluster with FPGAs and GPUs
conference, January 2010
- Tsoi, Kuen Hung; Luk, Wayne
- Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '10
Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system
conference, January 2011
- Li, Linchuan; Li, Xingjian; Tan, Guangming
- Proceedings of the 20th international symposium on High performance distributed computing - HPDC '11
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
conference, September 2012
- Ma, Kai; Li, Xue; Chen, Wei
- 2012 41st International Conference on Parallel Processing (ICPP)
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
conference, January 2009
- Venkatasubramanian, Sundaresan; Vuduc, Richard W.; none, none
- Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09
MapCG: writing parallel program portable between CPU and GPU
conference, January 2010
- Hong, Chuntao; Chen, Dehao; Chen, Wenguang
- Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors
conference, January 2010
- Gummaraju, Jayanth; Morichetti, Laurent; Houston, Michael
- Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
Efficient co-processor utilization in database query processing
journal, November 2013
- Breß, Sebastian; Beier, Felix; Rauhe, Hannes
- Information Systems, Vol. 38, Issue 8
A yoke of oxen and a thousand chickens for heavy lifting graph processing
conference, January 2012
- Gharaibeh, Abdullah; Beltrão Costa, Lauro; Santos-Neto, Elizeu
- Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
Heterogeneous Computational Model for Landform Attributes Representation on Multicore and Multi-GPU Systems
journal, January 2012
- Boratto, Murilo; Alonso, Pedro; Ramiro, Carla
- Procedia Computer Science, Vol. 9
A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL
book, January 2011
- Grewe, Dominik; O’Boyle, Michael F. P.
- Compiler Construction. Lecture Notes in Computer Science
Harmony: an execution model and runtime for heterogeneous many core systems
conference, January 2008
- Diamos, Gregory F.; Yalamanchili, Sudhakar
- Proceedings of the 17th international symposium on High performance distributed computing - HPDC '08
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
conference, September 2010
- Yang, Canqun; Wang, Feng; Du, Yunfei
- 2010 IEEE International Conference on Cluster Computing (CLUSTER)
A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms
journal, August 2011
- Benner, Peter; Ezzatti, Pablo; Kressner, Daniel
- Parallel Computing, Vol. 37, Issue 8
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
journal, August 2012
- Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
- ACM SIGPLAN Notices, Vol. 47, Issue 6
5.1 POWER8TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth
conference, February 2014
- Fluhr, Eric J.; Friedrich, Joshua; Dreps, Daniel
- 2014 IEEE International Solid- State Circuits Conference (ISSCC), 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)
Accelerating Protein Sequence Search in a Heterogeneous Computing System
conference, May 2011
- Xiao, Shucai; Lin, Heshan; Feng, Wu-chun
- Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms
conference, April 2010
- He, Zhengyu; Hong, Bo
- 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
An efficient, model-based CPU-GPU heterogeneous FFT library
conference, April 2008
- Ogata, Yasuhito; Endo, Toshio; Maruyama, Naoya
- 2008 IEEE International Symposium on Parallel and Distributed Processing
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
conference, January 2010
- Lee, Victor W.; Hammarlund, Per; Singhal, Ronak
- Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10
Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment
book, January 2007
- Ohshima, Satoshi; Kise, Kenji; Katagiri, Takahiro
- High Performance Computing for Computational Science - VECPAR 2006. Lecture Notes in Computer Science
Scalable fast multipole methods on distributed heterogeneous architectures
conference, January 2011
- Hu, Qi; Gumerov, Nail A.; Duraiswami, Ramani
- Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
MDR: performance model driven runtime for heterogeneous parallel platforms
conference, January 2011
- Pienaar, Jacques A.; Raghunathan, Anand; Chakradhar, Srimat
- Proceedings of the international conference on Supercomputing - ICS '11
Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters
journal, June 2012
- Lu, Fengshun; Song, Junqiang; Yin, Fukang
- Computer Physics Communications, Vol. 183, Issue 6
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
journal, June 2010
- Lee, Victor W.; Hammarlund, Per; Singhal, Ronak
- ACM SIGARCH Computer Architecture News, Vol. 38, Issue 3
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
conference, November 2010
- Hampton, Scott S.; Alam, Sadaf R.; Crozier, Paul S.
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A dynamic scheduling framework for emerging heterogeneous systems
conference, December 2011
- Ravi, Vignesh T.; Agrawal, Gagan
- 2011 18th International Conference on High Performance Computing (HiPC)
Implementation of Fdtd-Compatible Green'S Function on Heterogeneous Cpu-Gpu Parallel Processing System
journal, January 2013
- Stefanski, Tomasz P.
- Progress In Electromagnetics Research, Vol. 135
An asymmetric distributed shared memory model for heterogeneous parallel systems
journal, March 2010
- Gelado, Isaac; Stone, John E.; Cabezas, Javier
- ACM SIGPLAN Notices, Vol. 45, Issue 3
A New Parallel Method of Smith-Waterman Algorithm on a Heterogeneous Platform
book, January 2010
- Chen, Bo; Xu, Yun; Yang, Jiaoyun
- Algorithms and Architectures for Parallel Processing
Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
conference, November 2010
- Shirahata, Koichi; Sato, Hitoshi; Matsuoka, Satoshi
- 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on Cloud Computing Technology and Science
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
journal, January 2013
- Choi, Hong Jun; Son, Dong Oh; Kang, Seung Gu
- The Journal of Supercomputing, Vol. 65, Issue 2
Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures
journal, September 2011
- Meredith, Jeremy; Roth, Philip; Spafford, Kyle
- IEEE Micro, Vol. 31, Issue 5
An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures
conference, January 2011
- Silberstein, Mark; Maruyama, Naoya
- Proceedings of the 4th Annual International Conference on Systems and Storage - SYSTOR '11
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
conference, November 2010
- Rahimian, Abtin; Lashuk, Ilya; Veerapaneni, Shravan
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms
conference, September 2011
- Binotto, Alecio P. D.; Pereira, Carlos E.; Kuijper, Arjan
- Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications
A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters
journal, March 2013
- Li, Hung-Fu; Liang, Tyng-Yeu; Chiu, Jun-Yao
- The Journal of Supercomputing, Vol. 66, Issue 1
An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs
conference, October 2011
- Balevic, Ana; Kienhuis, Bart
- 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM)
Heterogeneous Systems for Energy Efficient Scientific Computing
book, January 2012
- Liu, Qiang; Luk, Wayne
- Reconfigurable Computing: Architectures, Tools and Applications. Lecture Notes in Computer Science
Fluid Simulation with Two-Way Interaction Rigid Body Using a Heterogeneous GPU and CPU Environment
conference, November 2010
- Junior, José Ricardo da S.; Clua, Esteban W.; Montenegro, Anselmo
- 2010 Brazilian Symposium on Games and Digital Entertainment (SBGAMES)
Task-based parallel breadth-first search in heterogeneous environments
conference, December 2012
- Munguia, Lluis-Miquel; Bader, David A.; Ayguade, Eduard
- 2012 19th International Conference on High Performance Computing (HiPC)
Power-aware dynamic task scheduling for heterogeneous accelerated clusters
conference, May 2009
- Hamano, Tomoaki; Endo, Toshio; Matsuoka, Satoshi
- 2009 IEEE International Symposium on Parallel & Distributed Processing
Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms
conference, July 2011
- Anzt, Hartwig; Heuveline, Vincent; Aliaga, Jose I.
- 2011 International Green Computing Conference (IGCC), 2011 International Green Computing Conference and Workshops
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
conference, January 2012
- Spafford, Kyle L.; Meredith, Jeremy S.; Lee, Seyong
- Proceedings of the 9th conference on Computing Frontiers - CF '12
A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters
conference, May 2011
- Liu, Wenjie; Du, Zhihui; Xiao, Yu
- Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
conference, May 2012
- Tan, Yu Shyang; Lee, Bu-Sung; He, Bingsheng
- 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing
conference, September 2012
- Odajima, Tetsuya; Boku, Taisuke; Hanawa, Toshihiro
- 2012 41st International Conference on Parallel Processing Workshops (ICPPW)
Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms
conference, September 2012
- Albayrak, Omer Erdil; Akturk, Ismail; Ozturk, Ozcan
- 2012 41st International Conference on Parallel Processing Workshops (ICPPW)
Iterative SLE Solvers over a CPU-GPU Platform
conference, September 2010
- Binotto, Alécio P. D.; Daniel, Christian; Weber, Daniel
- 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010), 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC)
Power-efficient time-sensitive mapping in heterogeneous systems
conference, January 2012
- Liu, Cong; Li, Jian; Huang, Wei
- Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
Predictive Runtime Code Scheduling for Heterogeneous Architectures
book, January 2009
- Jiménez, Víctor J.; Vilanova, Lluís; Gelado, Isaac
- High Performance Embedded Architectures and Compilers
Fast Snippet Generation Based on CPU-GPU Hybrid System
conference, December 2011
- Liu, Ding; Li, Ruixuan; Gu, Xiwu
- 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS)
AMD Fusion APU: Llano
journal, March 2012
- Branover, Alexander; Foley, Denis; Steinman, Maurice
- IEEE Micro, Vol. 32, Issue 2
Enabling task-level scheduling on heterogeneous platforms
conference, January 2012
- Sun, Enqiang; Schaa, Dana; Bagley, Richard
- Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
conference, January 2013
- Mistry, Perhaad; Ukidave, Yash; Schaa, Dana
- Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units - GPGPU-6
Biomedical image analysis on a cooperative cluster of GPUs and multicores
conference, January 2008
- Hartley, Timothy D. R.; Catalyurek, Umit; Ruiz, Antonio
- Proceedings of the 22nd annual international conference on Supercomputing - ICS '08
Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
conference, January 2014
- Pandit, Prasanna; Govindarajan, R.
- Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization - CGO '14
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
conference, October 2011
- Hong, Sungpack; Oguntebi, Tayo; Olukotun, Kunle
- 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)
An automatic input-sensitive approach for heterogeneous task partitioning
conference, January 2013
- Kofler, Klaus; Grasso, Ivan; Cosenza, Biagio
- Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13
Optimizing tensor contraction expressions for hybrid CPU-GPU execution
journal, November 2011
- Ma, Wenjing; Krishnamoorthy, Sriram; Villa, Oreste
- Cluster Computing, Vol. 16, Issue 1
A fully integrated multi-CPU, GPU and memory controller 32nm processor
conference, February 2011
- Yuffe, Marcelo; Knoll, Ernest; Mehalel, Moty
- 2011 IEEE International Solid- State Circuits Conference - (ISSCC), 2011 IEEE International Solid-State Circuits Conference
MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
conference, May 2012
- Jiang, Wei; Agrawal, Gagan
- 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
conference, May 2011
- Agullo, Emmanuel; Augonnet, Cedric; Dongarra, Jack
- Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
journal, January 2011
- Dziekonski, A.; Lamecki, A.; Mrozowski, M.
- IEEE Antennas and Wireless Propagation Letters, Vol. 10
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
conference, July 2011
- Daga, Mayank; Aji, Ashwin M.; Feng, Wu-chun
- 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
conference, September 2013
- Lee, Janghaeng; Samadi, Mehrzad; Park, Yongjun
- Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques
Mapping the sbr and Tw-Ildcs to Heterogeneous Cpu-Gpu Architecture for fast Computation of Electromagnetic Scattering
journal, January 2012
- Gao, Peng Cheng; Tao, Yu Bo; Bai, Zhi Hui
- Progress In Electromagnetics Research, Vol. 122
CPU-GPU hybrid parallel strategy for cosmological simulations: CPU-GPU HBRID PARALLEL STRATEGY FOR COSMOLOGICAL SIMULATION
journal, May 2013
- Wang, Yueqing; Dou, Yong; Guo, Song
- Concurrency and Computation: Practice and Experience, Vol. 26, Issue 3
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
conference, January 2012
- Li, Jiajia; Li, Xingjian; Tan, Guangming
- Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
A peta-scalable CPU-GPU algorithm for global atmospheric simulations
journal, August 2013
- Yang, Chao; Zheng, Weimin; Xue, Wei
- ACM SIGPLAN Notices, Vol. 48, Issue 8
Communication-Aware Task Partition and Voltage Scaling for Energy Minimization on Heterogeneous Parallel Systems
conference, October 2011
- Wang, Guibin; Song, Wei
- 2011 12th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT), 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures
conference, July 2011
- Horton, Mitch; Tomov, Stanimire; Dongarra, Jack
- 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
conference, January 2012
- Kim, Jungwon; Seo, Sangmin; Lee, Jun
- Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
A survey of techniques for improving energy efficiency in embedded computing systems
journal, January 2014
- Mittal, Sparsh
- International Journal of Computer Aided Engineering and Technology, Vol. 6, Issue 4
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
book, January 2009
- Ayguade, Eduard; Badia, Rosa M.; Cabrera, Daniel
- Evolving OpenMP in an Age of Extreme Parallelism
Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations
conference, January 2010
- Ravi, Vignesh T.; Ma, Wenjing; Chiu, David
- Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
conference, January 2011
- Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
- Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11
Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems
conference, July 2008
- Joselli, Mark; Zamith, Marcelo; Clua, Esteban
- 2008 IEEE 11th International Conference on Computational Science and Engineering (CSE), 2008 11th IEEE International Conference on Computational Science and Engineering
A Survey of Methods for Analyzing and Improving GPU Energy Efficiency
journal, August 2014
- Mittal, Sparsh; Vetter, Jeffrey S.
- ACM Computing Surveys, Vol. 47, Issue 2
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
journal, June 2011
- Prasad, Ashwin; Anantpur, Jayvant; Govindarajan, R.
- ACM SIGPLAN Notices, Vol. 46, Issue 6
Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms
text, January 2011
- Anzt, Hartwig; Heuveline, Vincent; Aliaga, José I.
- Karlsruher Institut für Technologie (KIT)
Accelerating Smith-Waterman on Heterogeneous CPU-GPU Systems
conference, May 2011
- Singh, Jaideep; Aruni, Ipseeta
- 2011 5th International Conference on Bioinformatics and Biomedical Engineering (iCBBE)
SPRAT: Runtime processor selection for energy-aware computing
conference, September 2008
- Takizawa, Hiroyuki; Sato, Katuto; Kobayashi, Hiroaki
- 2008 IEEE International Conference on Cluster Computing (CLUSTER)
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
conference, January 2010
- Becchi, Michela; Byna, Surendra; Cadambi, Srihari
- Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures - SPAA '10
Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems
conference, September 2010
- Siegel, Jakob; Villa, Oreste; Krishnamoorthy, Sriram
- 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS)
Hybrid ray tracing and path tracing of Bezier surfaces using a mixed hierarchy
conference, January 2012
- Nigam, Rohit; Narayanan, P. J.
- Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP '12
Linpack evaluation on a supercomputer with heterogeneous accelerators
conference, April 2010
- Endo, Toshio; Matsuoka, Satoshi; Nukada, Akira
- 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
book, January 2011
- Ltaief, Hatem; Tomov, Stanimire; Nath, Rajib
- High Performance Computing for Computational Science – VECPAR 2010. Lecture Notes in Computer Science
CoreTSAR: Adaptive Worksharing for Heterogeneous Systems
book, January 2014
- Scogland, Thomas R. W.; Feng, Wu-chun; Rountree, Barry
- Supercomputing. Lecture Notes in Computer Science
Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations
conference, September 2012
- Wen, Mei; Su, Huayou; Wei, Wenjie
- 2012 IEEE International Conference on Cluster Computing (CLUSTER)
Shot boundary detection using Zernike moments in multi-GPU multi-CPU architectures
journal, September 2012
- Toharia, Pablo; Robles, Oscar D.; Suárez, Ricardo
- Journal of Parallel and Distributed Computing, Vol. 72, Issue 9
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing
journal, December 2010
- Tomov, Stanimire; Nath, Rajib; Dongarra, Jack
- Parallel Computing, Vol. 36, Issue 12
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications
conference, September 2012
- Zhong, Ziming; Rychkov, Vladimir; Lastovetsky, Alexey
- 2012 IEEE International Conference on Cluster Computing (CLUSTER)
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
journal, November 2010
- Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond
- Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2
An asymmetric distributed shared memory model for heterogeneous parallel systems
journal, March 2010
- Gelado, Isaac; Cabezas, Javier; Navarro, Nacho
- ACM SIGARCH Computer Architecture News, Vol. 38, Issue 1
Coordinating the use of GPU and CPU for improving performance of compute intensive applications
conference, August 2009
- Teodoro, George; Sachetto, Rafael; Sertel, Olcay
- 2009 IEEE International Conference on Cluster Computing and Workshops
An On-Demand Fast Parallel Pseudo Random Number Generator with Applications
conference, May 2012
- Banerjee, Dip Sankar; Bahl, Aman Kumar; Kothapalli, Kishore
- 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing
journal, March 2015
- Vetter, Jeffrey S.; Mittal, Sparsh
- Computing in Science & Engineering, Vol. 17, Issue 2
Multilevel summation of electrostatic potentials using graphics processing units
journal, March 2009
- Hardy, David J.; Stone, John E.; Schulten, Klaus
- Parallel Computing, Vol. 35, Issue 3
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
journal, January 2013
- Belviranli, Mehmet E.; Bhuyan, Laxmi N.; Gupta, Rajiv
- ACM Transactions on Architecture and Code Optimization, Vol. 9, Issue 4
Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference
journal, March 2013
- Chai, Jun; Su, Huayou; Wen, Mei
- The Journal of Supercomputing, Vol. 66, Issue 1
Heterogeneous Task Scheduling for Accelerated OpenMP
conference, May 2012
- Scogland, Thomas R. W.; Rountree, Barry; Feng, Wu-chun
- 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Evaluating application performance and energy consumption on hybrid CPU+GPU architecture
journal, June 2012
- Padoin, Edson Luiz; Pilla, Laércio Lima; Boito, Francieli Zanon
- Cluster Computing, Vol. 16, Issue 3
GPU and APU computations of Finite Time Lyapunov Exponent fields
journal, March 2012
- Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
- Journal of Computational Physics, Vol. 231, Issue 5
Portable performance on heterogeneous architectures
journal, April 2013
- Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
- ACM SIGPLAN Notices, Vol. 48, Issue 4
Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images
journal, January 2011
- Lecron, Fabian; Mahmoudi, Sidi Ahmed; Benjelloun, Mohammed
- International Journal of Biomedical Imaging, Vol. 2011
IBM POWER7+ design for higher frequency at fixed power
journal, November 2013
- Zyuban, V.; Taylor, S. A.; Christensen, B.
- IBM Journal of Research and Development, Vol. 57, Issue 6
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
conference, January 2011
- Shimokawabe, Takashi; Aoki, Takayuki; Takaki, Tomohiro
- Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
Dynamic load balancing on heterogeneous multicore/multiGPU systems
conference, June 2010
- Acosta, Alejandro; Corujo, Robert; Blanco, Vicente
- Simulation (HPCS), 2010 International Conference on High Performance Computing & Simulation
A Technique for Collision Detection and 3D Interaction Based on Parallel GPU and CPU Processing
conference, November 2011
- Tsuda, Fernando; Nakamura, Ricardo
- 2011 Brazilian Symposium on Games and Digital Entertainment (SBGAMES)
GPU-enabled efficient executions of radiation calculations in climate modeling
conference, December 2013
- Korwar, Sai Kiran; Vadhiyar, Sathish; Nanjundiah, Ravi S.
- 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
Dynamic Distribution of Workload between CPU and GPU for a Parallel Conjugate Gradient Method in an Adaptive FEM
journal, January 2013
- Lang, Jens; Rünger, Gudula
- Procedia Computer Science, Vol. 18
Combinatorial Bidirectional Path-Tracing for Efficient Hybrid CPU/GPU Rendering
journal, April 2011
- Pajot, Anthony; Barthe, Loïc; Paulin, Mathias
- Computer Graphics Forum, Vol. 30, Issue 2
A hybrid shared memory heterogeneous execution platform for PCIe-based GPGPUs
conference, December 2013
- Shukla, Sambit K.; Bhuyan, Laxmi N.
- 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
Optimization Techniques for 3D-FWT on Systems with Manycore GPUs and Multicore CPUs
journal, January 2013
- Bernabé, Gregorio; Cuenca, Javier; Giménez, Domingo
- Procedia Computer Science, Vol. 18
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
journal, May 2010
- Stone, John E.; Gohara, David; Shi, Guochun
- Computing in Science & Engineering, Vol. 12, Issue 3, p. 66-73
A 22nm IA multi-CPU and GPU System-on-Chip
conference, February 2012
- Damaraju, Satish; George, Varghese; Jahagirdar, Sanjeev
- 2012 IEEE International Solid- State Circuits Conference - (ISSCC), 2012 IEEE International Solid-State Circuits Conference
Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems
conference, April 2012
- Hetherington, Tayler H.; Rogers, Timothy G.; Hsu, Lisa
- 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)
Scaling Hierarchical N-body Simulations on GPU Clusters
conference, November 2010
- Jetley, Pritish; Wesolowski, Lukasz; Gioachin, Filippo
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
conference, April 2011
- Gregg, Chris; Hazelwood, Kim
- Software (ISPASS), (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
conference, January 2013
- Wang, Zhenning; Zheng, Long; Chen, Quan
- Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '13
Compiler and runtime support for enabling reduction computations on heterogeneous systems: REDUCTION COMPUTATIONS ON HETEROGENEOUS SYSTEMS
journal, October 2011
- Ravi, Vignesh T.; Ma, Wenjing; Chiu, David
- Concurrency and Computation: Practice and Experience, Vol. 24, Issue 5
Automatic dataflow application tuning for heterogeneous systems
conference, December 2010
- Hartley, Timothy D. R.; Saule, Erik; Catalyurek, Umit V.
- 2010 International Conference on High Performance Computing (HiPC)
A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer
conference, May 2012
- Wu, Qiang; Yang, Canqun; Wang, Feng
- 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Synergistic execution of stream programs on multicores with accelerators
journal, June 2009
- Udupa, Abhishek; Govindarajan, R.; Thazhuthaveetil, Matthew J.
- ACM SIGPLAN Notices, Vol. 44, Issue 7
X-device query processing by bitwise distribution
conference, January 2012
- Pirk, Holger; Sellam, Thibault; Manegold, Stefan
- Proceedings of the Eighth International Workshop on Data Management on New Hardware - DaMoN '12
CPU/GPU computing for long-wave radiation physics on large GPU clusters
journal, April 2012
- Lu, Fengshun; Song, Junqiang; Cao, Xiaoqun
- Computers & Geosciences, Vol. 41
Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation
book, January 2012
- Muraraşu, Alin; Weidendorfer, Josef; Bode, Arndt
- Euro-Par 2011: Parallel Processing Workshops. Lecture Notes in Computer Science
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
conference, May 2012
- Teodoro, George; Kurc, Tahsin M.; Pan, Tony
- 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Rodinia: A benchmark suite for heterogeneous computing
conference, October 2009
- Che, Shuai; Boyer, Michael; Meng, Jiayuan
- 2009 IEEE International Symposium on Workload Characterization (IISWC)
Asymptotic peak Utilisation in Heterogeneous Parallel Cpu/Gpu Pipelines: a Decentralised Queue Monitoring Strategy
journal, May 2012
- Garba, Michael T.; GonzÁLez–VÉLez, Horacio
- Parallel Processing Letters, Vol. 22, Issue 02
Dynamically managed data for CPU-GPU architectures
conference, January 2012
- Jablin, Thomas B.; Jablin, James A.; Prabhu, Prakash
- Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
conference, January 2013
- Shen, Jie; Varbanescu, Ana Lucia; Sips, Henk
- Proceedings of the ACM International Conference on Computing Frontiers - CF '13
Medical Ultrasound Imaging: To GPU or Not to GPU?
journal, September 2011
- So, Hayden; Chen, Junying; Yiu, Billy
- IEEE Micro, Vol. 31, Issue 5
Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters
book, January 2012
- Clarke, David; Ilic, Aleksandar; Lastovetsky, Alexey
- Euro-Par 2012 Parallel Processing
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems
journal, January 2012
- Vömel, Christof; Tomov, Stanimire; Dongarra, Jack
- SIAM Journal on Scientific Computing, Vol. 34, Issue 2
The Scalable Heterogeneous Computing (SHOC) benchmark suite
conference, January 2010
- Danalis, Anthony; Marin, Gabriel; McCurdy, Collin
- Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10
A survey of architectural techniques for DRAM power management
journal, January 2012
- Mittal, Sparsh
- International Journal of High Performance Systems Architecture, Vol. 4, Issue 2
Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function
book, January 2010
- Benner, Peter; Ezzatti, Pablo; Quintana-Ortí, Enrique S.
- Lecture Notes in Computer Science
Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU–GPU Systems
journal, March 2013
- Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram
- Journal of Chemical Theory and Computation, Vol. 9, Issue 4
Practical Time Bundle Adjustment for 3D Reconstruction on the GPU
book, January 2012
- Choudhary, Siddharth; Gupta, Shubham; Narayanan, P. J.
- Trends and Topics in Computer Vision
Solving a kind of BVP for ODEs on heterogeneous CPU + CUDA-enabled GPU systems
conference, October 2010
- Stpiczynski, Przemyslaw; Potiopa, Joanna
- 2010 International Multiconference on Computer Science and Information Technology (IMCSIT 2010), Proceedings of the International Multiconference on Computer Science and Information Technology
Load balancing in a changing world: dealing with heterogeneity and performance variability
conference, January 2013
- Boyer, Michael; Skadron, Kevin; Che, Shuai
- Proceedings of the ACM International Conference on Computing Frontiers - CF '13
Efficient irregular wavefront propagation algorithms on hybrid CPU–GPU machines
journal, April 2013
- Teodoro, George; Pan, Tony; Kurc, Tahsin M.
- Parallel Computing, Vol. 39, Issue 4-5
A Hybrid CPU-GPU Accelerated Framework for Fast Mapping of High-Resolution Human Brain Connectome
journal, May 2013
- Wang, Yu; Du, Haixiao; Xia, Mingrui
- PLoS ONE, Vol. 8, Issue 5
Hybrid algorithms for list ranking and graph connected components
conference, December 2011
- Banerjee, Dip Sankar; Kothapalli, Kishore
- 2011 18th International Conference on High Performance Computing (HiPC)
Maestro: Data Orchestration and Tuning for OpenCL Devices
book, January 2010
- Spafford, Kyle; Meredith, Jeremy; Vetter, Jeffrey
- Euro-Par 2010 - Parallel Processing
Using graphics processors for high performance IR query processing
conference, January 2009
- Ding, Shuai; He, Jinru; Yan, Hao
- Proceedings of the 18th international conference on World wide web - WWW '09
Enhancing Cloud-Based Servers by GPU/CPU Virtualization Management
book, January 2013
- Wu, Tin-Yu; Lee, Wei-Tsong; Duan, Chien-Yu
- Advances in Intelligent Systems and Applications. Smart Innovation, Systems and Technologies
A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs
journal, June 2014
- Mittal, Sparsh
- Journal of Circuits, Systems and Computers, Vol. 23, Issue 08
Accelerating Kirchhoff Migration by CPU and GPU Cooperation
conference, October 2009
- Panetta, J.; Teixeira, T.; de Souza Filho, P. R. P.
- 2009 21st International Symposium on Computer Architecture and High Performance Computing. SBAC-PAD 2009
DESTINY: A Tool for Modeling Emerging 3D NVM and eDRAM caches
conference, January 2015
- Poremba, Matt; Mittal, Sparsh; Li, Dong
- Design, Automation and Test in Europe, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems
conference, December 2013
- Su, Yu; Ye, Ding; Xue, Jingling
- 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
Real-Time Non-rigid Registration of Medical Images on a Cooperative Parallel Architecture
conference, November 2009
- Liu, Yixun; Fedorov, Andriy; Kikinis, Ron
- 2009 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
conference, January 2009
- Luk, Chi-Keung; Hong, Sunpyo; Kim, Hyesoon
- Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42
A peta-scalable CPU-GPU algorithm for global atmospheric simulations
conference, January 2013
- Yang, Chao; Zheng, Weimin; Xue, Wei
- Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13
Accelerating MapReduce on a coupled CPU-GPU architecture
conference, November 2012
- Chen, Linchuan; Huo, Xin; Agrawal, Gagan
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Portable performance on heterogeneous architectures
conference, January 2013
- Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
- Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13
Quantifying the energy efficiency of FFT on heterogeneous platforms
conference, April 2013
- Ukidave, Yash; Ziabari, Amir Kavyan; Mistry, Perhaad
- 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system
conference, January 2012
- Humphrey, Alan; Meng, Qingyu; Berzins, Martin
- Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment on Bridging from the eXtreme to the campus and beyond - XSEDE '12
Biomedical image analysis on a cooperative cluster of GPUs and multicores
conference, January 2014
- Hartley, Timothy D. R.; Catalyurek, Umit; Ruiz, Antonio
- 25th Anniversary International Conference on Supercomputing Anniversary Volume -
Acceleration of Hessenberg Reduction for Nonsymmetric Eigenvalue Problems in a Hybrid CPU-GPU Computing Environment
journal, January 2011
- Muramatsu, Jun-ichi; Fukaya, Takeshi; Zhang, Shao-Liang
- International Journal of Networking and Computing, Vol. 1, Issue 2
Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction
journal, April 2012
- Agulleiro, J. I.; Vázquez, F.; Garzón, E. M.
- Ultramicroscopy, Vol. 115
A survey of architectural techniques for improving cache power efficiency
journal, March 2014
- Mittal, Sparsh
- Sustainable Computing: Informatics and Systems, Vol. 4, Issue 1
أنظمة الرقابية المالية العربية وإعادة هيكلتها وفق نظام Twin Peaks
journal, January 2017
- أحمد, مداني
- مجلة إقتصاديات شمال إفريقيا
Hybrid Core Acceleration of UWB SIRE Radar Signal Processing
journal, January 2011
- Park, Song Jun; Ross, James; Shires, Dale
- IEEE Transactions on Parallel and Distributed Systems, Vol. 22, Issue 1
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU
journal, October 2010
- Shen, Wenfeng; Wei, Daming; Xu, Weimin
- Computer Methods and Programs in Biomedicine, Vol. 100, Issue 1
Automatic generation of software pipelines for heterogeneous parallel systems
conference, November 2012
- Pienaar, Jacques A.; Chakradhar, Srimat; Raghunathan, Anand
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Performance characterization of data-intensive kernels on AMD Fusion architectures
journal, May 2012
- Lee, Kenneth; Lin, Heshan; Feng, Wu-chun
- Computer Science - Research and Development, Vol. 28, Issue 2-3
Works referencing / citing this record:
Artificial intelligence: a survey on evolution, models, applications and future trends
journal, January 2019
- Lu, Yang
- Journal of Management Analytics, Vol. 6, Issue 1
Crossing the chasm: how to develop weather and climate models for next generation computers?
journal, January 2018
- Lawrence, Bryan N.; Rezny, Michael; Budich, Reinhard
- Geoscientific Model Development, Vol. 11, Issue 5
Task management on fully heterogeneous micro-server system: Modeling and resolution strategies: Task management on fully heterogeneous micro-server system: Modeling and resolution strategies
journal, September 2018
- Zaourar, Lilia; Ait Aba, Massinissa; Briand, David
- Concurrency and Computation: Practice and Experience, Vol. 30, Issue 23
Optimizing parameter sensitivity analysis of large‐scale microscopy image analysis workflows with multilevel computation reuse
journal, June 2019
- Barreiros, Willian; Moreira, Jeremias; Kurc, Tahsin
- Concurrency and Computation: Practice and Experience, Vol. 32, Issue 2
Energy‐aware task scheduling with time constraint for heterogeneous cloud datacenters
journal, July 2019
- Liu, Xing; Liu, Panwen; Hu, Lun
- Concurrency and Computation: Practice and Experience, Vol. 32, Issue 18
FAST-FUSION: An Improved Accuracy Omnidirectional Visual Odometry System with Sensor Fusion and GPU Optimization for Embedded Low Cost Hardware
journal, December 2019
- Aguiar, André; Santos, Filipe; Sousa, Armando Jorge
- Applied Sciences, Vol. 9, Issue 24
Dynamic Load Balancing Algorithm for Heterogeneous Clusters
book, March 2018
- do Nascimento, Tiago Marques; dos Santos, Rodrigo Weber; Lobosco, Marcelo
- Parallel Processing and Applied Mathematics
Implementation of a non-linear solver on heterogeneous architectures: Implementation of a non-linear solver on heterogeneous architectures
journal, August 2018
- Carracciuolo, Luisa; Lapegna, Marco
- Concurrency and Computation: Practice and Experience, Vol. 30, Issue 24
Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS)
journal, November 2019
- Giorgi, Roberto; Khalili, Farnam; Procaccini, Marco
- International Journal of Reconfigurable Computing, Vol. 2019
Aspect-Oriented Set@l Language for Architecture-Independent Programming of High-Performance Computer Systems
book, January 2019
- Levin, Ilya I.; Dordopulo, Alexey I.; Pisarenko, Ivan V.
- Supercomputing: 5th Russian Supercomputing Days, RuSCDays 2019, Moscow, Russia, September 23–24, 2019, Revised Selected Papers, p. 517-528
Efficient Execution of Smart City’s Assets Through a Massive Parallel Computational Model
book, July 2018
- Ashraf, Muhammad Usman; Eassa, Fathy Alboraei; Albeshri, Aiiad Ahmad
- Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
A Heterogeneous Parallel LU Factorization Algorithm Based on a Basic Column Block Uniform Allocation Strategy
journal, February 2019
- Wu, Rongteng; Xie, Xiaohong
- Mathematical Problems in Engineering, Vol. 2019
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
journal, February 2019
- Dendaluce Jahnke, Martin; Cosco, Francesco; Novickis, Rihards
- Electronics, Vol. 8, Issue 2
Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application
journal, May 2018
- Mohebbi, Hamidreza
- International Journal of Parallel Programming, Vol. 47, Issue 1
A survey of techniques for improving efficiency of mobile web browsing
journal, July 2018
- Mittal, Sparsh; Mattela, Venkat
- Concurrency and Computation: Practice and Experience, Vol. 31, Issue 15
A Deep Pipelined Implementation of Hyperspectral Target Detection Algorithm on FPGA Using HLS
journal, March 2018
- Lei, Jie; Li, Yunsong; Zhao, Dongsheng
- Remote Sensing, Vol. 10, Issue 4
A Survey of Medical Imaging, Storage and Transfer Techniques
book, January 2019
- Meenatchi Aparna, R. R.; Shanmugavadivu, P.
- Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB)
Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
journal, February 2019
- Dávila Guzmán, María Angélica; Nozal, Raúl; Gran Tejero, Rubén
- The Journal of Supercomputing, Vol. 75, Issue 3
A survey of techniques for architecting TLBs: A survey of techniques for architecting translation lookaside buffers
journal, December 2016
- Mittal, Sparsh
- Concurrency and Computation: Practice and Experience, Vol. 29, Issue 10
A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks
journal, April 2018
- Mittal, Sparsh
- Machine Learning and Knowledge Extraction, Vol. 1, Issue 1
Page Locked GPGPU Rotational Visual Secret Sharing
book, January 2020
- Raviraja Holla, M.; Suma, D.; Smys, S.
- Second International Conference on Computer Networks and Communication Technologies: ICCNCT 2019, p. 349-359
High-performance low-power approximate Wallace tree multiplier
journal, July 2018
- Abed, Sa'ed; Khalil, Yasser; Modhaffar, Mahdi
- International Journal of Circuit Theory and Applications, Vol. 46, Issue 12
A survey of FPGA-based accelerators for convolutional neural networks
journal, October 2018
- Mittal, Sparsh
- Neural Computing and Applications, Vol. 32, Issue 4
Crossing the chasm: how to develop weather and climate models for next generation computers?
text, January 2018
- N., Lawrence, Bryan; Michael, Rezny,; Reinhard, Budich,
- ETH Zurich
Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review
journal, April 2018
- Memeti, Suejb; Pllana, Sabri; Binotto, Alécio
- Computing, Vol. 101, Issue 8
GPU processing of theta-joins: GPU processing of theta-joins
journal, June 2017
- Bellas, Christos; Gounaris, Anastasios
- Concurrency and Computation: Practice and Experience, Vol. 29, Issue 18
A survey of techniques for architecting SLC/MLC/TLC hybrid Flash memory-based SSDs: A survey of techniques for architecting hybrid flash memory based SSDs
journal, January 2018
- Alsalibi, Ahmed Izzat; Mittal, Sparsh; Al-Betar, Mohammed Azmi
- Concurrency and Computation: Practice and Experience, Vol. 30, Issue 13
The Set@l Programming Language and Its Application for Coding Gaussian Elimination
book, August 2019
- Levin, Ilya I.; Dordopulo, Aleksey I.; Pisarenko, Ivan V.
- Parallel Computational Technologies: 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2–4, 2019, Revised Selected Papers, p. 45-57