skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing

Journal Article · · IEEE Transactions on Parallel and Distributed Systems

The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this work, we propose ARENA – an asynchronous reconfigurable accelerator ring architecture as a potential scenario on how the future HPC and data centers will be like. Despite using the coarse-grained reconfigurable arrays (CGRAs) as the substrate platform, our key contribution is not only the CGRA-cluster design itself, but also the ensemble of a new architecture and programming model that enables asynchronous tasking across a cluster of reconfigurable nodes, so as to bring specialized computation to the data rather than the reverse. We presume distributed data storage without asserting any prior knowledge on the data distribution. Hardware specialization occurs at runtime when a task finds the majority of data it requires are available at the present node. In other words, we dynamically generate specialized CGRA accelerators where the data reside. The asynchronous tasking for bringing computation to data is achieved by circulating the task token, which describes the dataflow graphs to be executed for a task, among the CGRA cluster connected by a fast ring network. Evaluations on a set of HPC and data-driven applications across different domains show that ARENA can provide better parallel scalability with reduced data movement (53.9 percent). Compared with contemporary compute-centric parallel models, ARENA can bring on average 4.37× speedup. The synthesized CGRAs and their task-dispatchers only occupy 2.93mm 2 chip area under 45nm process technology and can run at 800MHz with on average 759.8mW power consumption. ARENA also supports the concurrent execution of multi-applications, offering ideal architectural support for future high-performance parallel computing and data analytics systems.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC05-76RL01830; 66150
OSTI ID:
1811825
Report Number(s):
PNNL-SA-152862
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Vol. 32, Issue 12; ISSN 1045-9219
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (52)

Chimaera conference January 2000
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS text January 2015
A lightweight infrastructure for graph analytics conference January 2013
High-Resolution Simulation of Pore-Scale Reactive Transport Processes Associated with Carbon Sequestration journal November 2014
Reconfigurable Computing Architectures journal March 2015
Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center conference February 2017
Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity report December 2018
Achieving Flexible Global Reconfiguration in NoCs Using Reconfigurable Rings journal March 2020
A bridging model for parallel computation journal August 1990
Implementation of a volume rendering on coarse-grained reconfigurable multiprocessor conference December 2012
An MTL Theory Approach for the Simulation of MIMO Power-Line Communication Channels journal July 2011
PyMTL3: A Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification journal July 2020
Divide-and-conquer quantum mechanical material simulations with exascale supercomputers journal December 2014
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster conference August 2016
FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters conference April 2018
Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation book January 2004
A Configurable Cloud-Scale DNN Processor for Real-Time AI conference June 2018
On-Chip Networks journal January 2009
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective conference February 2018
Handling task dependencies under strided and aliased references conference June 2010
Polymorphic pipeline array conference December 2009
Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture journal March 2021
Plasticine conference June 2017
Numerical algorithms for high-performance computational science
  • Dongarra, Jack; Grigori, Laura; Higham, Nicholas J.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0066
journal January 2020
A reconfigurable fabric for accelerating large-scale datacenter services conference June 2014
Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer
  • Shaw, David E.; Grossman, J. P.; Bank, Joseph A.
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.9
conference November 2014
Parallel Programmability and the Chapel Language journal August 2007
Rodinia: A benchmark suite for heterogeneous computing conference October 2009
X10: an object-oriented approach to non-uniform cluster computing journal October 2005
Intel® Xeon Phi coprocessor (codename Knights Corner) conference August 2012
Integrating Reconfigurable Hardware-Based Grid for High Performance Computing journal January 2015
MapReduce: simplified data processing on large clusters journal January 2008
An Introduction to Reconfigurable Systems journal March 2015
MDGRAPE-4: a special-purpose computer system for molecular dynamics simulations
  • Ohmura, Itta; Morimoto, Gentaro; Ohno, Yousuke
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 372, Issue 2021 https://doi.org/10.1098/rsta.2013.0387
journal August 2014
Data-Driven Versus Topology-driven Irregular Computations on GPUs conference May 2013
Characterizing the energy consumption of data transfers and arithmetic operations on x86−64 processors conference August 2010
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning conference September 2016
NCBI BLAST: a better web interface journal May 2008
Legion: Expressing locality and independence with logical regions
  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71
conference November 2012
Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations journal October 2017
Cilk: An Efficient Multithreaded Runtime System journal August 1996
GASNet-EX: A High-Performance, Portable Communication Library for Exascale report October 2018
Google Workloads for Consumer Devices
  • Boroumand, Amirali; Ghose, Saugata; Kim, Youngsok
  • Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3173162.3173177
conference March 2018
Routerless Network-on-Chip conference February 2018
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus conference May 2007
IMR: High-Performance Low-Cost Multi-Ring NoCs journal June 2016
LLVM: A compilation framework for lifelong program analysis & transformation conference January 2004
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect journal January 2020
HyCUBE conference June 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit conference January 2017
RC3E: Reconfigurable Accelerators in Data Centres and Their Provision by Adapted Service Models conference June 2016
Quantifying the energy cost of data movement in scientific applications conference September 2013