skip to main content

DOE PAGESDOE PAGES

This content will become publicly available on November 9, 2017

Title: Programming with BIG data in R: Scaling analytics from one to thousands of nodes

Here, we present a tutorial overview showing how one can achieve scalable performance with R. We do so by utilizing several package extensions, including those from the pbdR project. These packages consist of high performance, high-level interfaces to and extensions of MPI, PBLAS, ScaLAPACK, I/O libraries, profiling libraries, and more. While these libraries shine brightest on large distributed platforms, they also work rather well on small clusters and often, surprisingly, even on a laptop with only two cores. Our tutorial begins with recommendations on how to get more performance out of your R code before considering parallel implementations. Because R is a high-level language, a function can have a deep hierarchy of operations. For big data, this can easily lead to inefficiency. Profiling is an important tool to understand the performance of an R code for both serial and parallel improvements.
Authors:
 [1] ;  [2] ;  [3] ;  [4]
  1. Univ. of Tennessee, Knoxville, TN (United States)
  2. U.S. Food and Drug Administration, Silver Spring, MD (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  4. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
OSTI Identifier:
1333101
Grant/Contract Number:
AC05-00OR22725
Type:
Accepted Manuscript
Journal Name:
Big Data Research
Additional Journal Information:
Journal Name: Big Data Research; Journal ID: ISSN 2214-5796
Publisher:
Elsevier
Research Org:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Joint Institute for Computational Sciences (JICS)
Sponsoring Org:
ME USDOE - Office of Management, Budget, and Evaluation; ORNL work for others; USDOE Office of Science (SC)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING