skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Programming with BIG data in R: Scaling analytics from one to thousands of nodes

Journal Article · · Big Data Research
 [1];  [2];  [3];  [4]
  1. Univ. of Tennessee, Knoxville, TN (United States)
  2. U.S. Food and Drug Administration, Silver Spring, MD (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  4. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

Here, we present a tutorial overview showing how one can achieve scalable performance with R. We do so by utilizing several package extensions, including those from the pbdR project. These packages consist of high performance, high-level interfaces to and extensions of MPI, PBLAS, ScaLAPACK, I/O libraries, profiling libraries, and more. While these libraries shine brightest on large distributed platforms, they also work rather well on small clusters and often, surprisingly, even on a laptop with only two cores. Our tutorial begins with recommendations on how to get more performance out of your R code before considering parallel implementations. Because R is a high-level language, a function can have a deep hierarchy of operations. For big data, this can easily lead to inefficiency. Profiling is an important tool to understand the performance of an R code for both serial and parallel improvements.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Joint Institute for Computational Sciences (JICS)
Sponsoring Organization:
Work for Others (WFO); USDOE Office of Science (SC)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1333101
Alternate ID(s):
OSTI ID: 1416808
Journal Information:
Big Data Research, Journal Name: Big Data Research; ISSN 2214-5796
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 11 works
Citation information provided by
Web of Science

References (5)

RcppArmadillo: Accelerating R with high-performance C++ linear algebra journal March 2014
Singular value decomposition and least squares solutions journal April 1970
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions journal January 2011
Basic Linear Algebra Subprograms for Fortran Usage journal September 1979
Methods of Multivariate Analysis book January 2002

Cited By (3)

Big Data Analytics: A Review on Theoretical Contributions and Tools Used in Literature journal June 2017
Network Design towards Sustainability of Chinese Baijiu Industry from a Supply Chain Perspective journal November 2018
Situating Ecology as a Big-Data Science: Current Advances, Challenges, and Solutions journal July 2018

Similar Records

PETSc/TAO Users Manual (Rev. 3.19)
Technical Report · Thu Mar 30 00:00:00 EDT 2023 · OSTI ID:1333101

PETSc Users Manual (Rev. 3.3)
Technical Report · Sat May 11 00:00:00 EDT 2013 · OSTI ID:1333101

PETSc Users Manual (Rev. 3.4)
Technical Report · Sun Jun 29 00:00:00 EDT 2014 · OSTI ID:1333101

Related Subjects