skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Processing-in-Memory Enabled Graphics Processors for 3D Rendering

Abstract

The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPU for efficient 3D rendering.

Authors:
; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1358514
Report Number(s):
PNNL-SA-122891
KJ0402000
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: IEEE International Symposium on High Performance Computer Architecture (HPCA 2017), February 4-8, 2017, Austin, Texas, 637-648
Country of Publication:
United States
Language:
English
Subject:
Innovative hardware/software co-design

Citation Formats

Xie, Chenhao, Song, Shuaiwen, Wang, Jing, Zhang, Weigong, and Fu, Xin. Processing-in-Memory Enabled Graphics Processors for 3D Rendering. United States: N. p., 2017. Web. doi:10.1109/HPCA.2017.37.
Xie, Chenhao, Song, Shuaiwen, Wang, Jing, Zhang, Weigong, & Fu, Xin. Processing-in-Memory Enabled Graphics Processors for 3D Rendering. United States. doi:10.1109/HPCA.2017.37.
Xie, Chenhao, Song, Shuaiwen, Wang, Jing, Zhang, Weigong, and Fu, Xin. Mon . "Processing-in-Memory Enabled Graphics Processors for 3D Rendering". United States. doi:10.1109/HPCA.2017.37.
@article{osti_1358514,
title = {Processing-in-Memory Enabled Graphics Processors for 3D Rendering},
author = {Xie, Chenhao and Song, Shuaiwen and Wang, Jing and Zhang, Weigong and Fu, Xin},
abstractNote = {The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPU for efficient 3D rendering.},
doi = {10.1109/HPCA.2017.37},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Feb 06 00:00:00 EST 2017},
month = {Mon Feb 06 00:00:00 EST 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Abstract not provided.
  • A parallel approach to computer graphics rendering is needed in order to generate images of highly complex data quickly. This paper presents such an algorithm which can be used on general purpose multiprocessor computers. It was designed for a physically-distributed, logically-shared memory computer with features amenable to both shared memory and message passing implementation. The work decomposition strategy involves assigning rectangular screen space areas to processors initially. Then, as the computation proceeds, processors which have completed their work adaptively partition another processor's task in order to even the computational load. Results on the BBN TC2000 computer at LLNL for thismore » algorithm indicate an efficiency ranging from 59% to 90% on 96 processors for a variety of test images. In fact, a rendering speed of almost 100,000 polygons per second (anti-aliased, specular highlighted) was achieved at this configuration using the software algorithm described here.« less
  • A parallel approach to computer graphics rendering is needed in order to generate images of highly complex data quickly. This paper presents such an algorithm which can be used on general purpose multiprocessor computers. It was designed for a physically-distributed, logically-shared memory computer with features amenable to both shared memory and message passing implementation. The work decomposition strategy involves assigning rectangular screen space areas to processors initially. Then, as the computation proceeds, processors which have completed their work adaptively partition another processor`s task in order to even the computational load. Results on the BBN TC2000 computer at LLNL for thismore » algorithm indicate an efficiency ranging from 59% to 90% on 96 processors for a variety of test images. In fact, a rendering speed of almost 100,000 polygons per second (anti-aliased, specular highlighted) was achieved at this configuration using the software algorithm described here.« less