skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scaling Up Data-Parallel Analytics Platforms: Linear Algebraic Operation Cases

Abstract

Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are key to supporting large scale data analysis that require efficient processing over millions of data samples. To this end, we present, ARION, a hardware acceleration based approach for scaling-up individual tasks of Spark, a popular data-parallel analytics platform. We support both linear algebraic operations of between two dense matrices, and between sparse and dense matrices in distributed environments. ARION provides a flexible control of acceleration according to matrix density, along with efficient scheduling based on runtime resource utilization. We demonstrate the benefit of our approach for general matrix multiplication operations over large matrices with up to four billion elements by using Gramian matrix computation that is commonly used in machine learning. Experiments show that our approach achieves more than 2× and 1.5× end-to-end performance speedups for dense and sparse matrices, respectively, and up to 57.04× faster computation compared to MLlib, a state of the art Spark-based implementation. This work is sponsored in part by the NSF under the grants: CNS-1565314, CNS-1405697, and CNS-1615411. The manuscript has been authored by UT-Battelle, LLC undermore » Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.« less

Authors:
 [1]; ORCiD logo [2];  [3];  [1]; ORCiD logo [2]
  1. Virginia Tech, Blacksburg, VA
  2. ORNL
  3. IBM Almaden Research
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1422792
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2017 IEEE International Conference on Big Data (Big Data 2017) - Boston, Massachusetts, United States of America - 12/11/2017 10:00:00 AM-12/14/2017 5:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Xu, Luna, Lim, Seung-Hwan, Li, Min, Butt, Ali R., and Kannan, Ramakrishnan. Scaling Up Data-Parallel Analytics Platforms: Linear Algebraic Operation Cases. United States: N. p., 2017. Web. doi:10.1109/BigData.2017.8257935.
Xu, Luna, Lim, Seung-Hwan, Li, Min, Butt, Ali R., & Kannan, Ramakrishnan. Scaling Up Data-Parallel Analytics Platforms: Linear Algebraic Operation Cases. United States. https://doi.org/10.1109/BigData.2017.8257935
Xu, Luna, Lim, Seung-Hwan, Li, Min, Butt, Ali R., and Kannan, Ramakrishnan. 2017. "Scaling Up Data-Parallel Analytics Platforms: Linear Algebraic Operation Cases". United States. https://doi.org/10.1109/BigData.2017.8257935. https://www.osti.gov/servlets/purl/1422792.
@article{osti_1422792,
title = {Scaling Up Data-Parallel Analytics Platforms: Linear Algebraic Operation Cases},
author = {Xu, Luna and Lim, Seung-Hwan and Li, Min and Butt, Ali R. and Kannan, Ramakrishnan},
abstractNote = {Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are key to supporting large scale data analysis that require efficient processing over millions of data samples. To this end, we present, ARION, a hardware acceleration based approach for scaling-up individual tasks of Spark, a popular data-parallel analytics platform. We support both linear algebraic operations of between two dense matrices, and between sparse and dense matrices in distributed environments. ARION provides a flexible control of acceleration according to matrix density, along with efficient scheduling based on runtime resource utilization. We demonstrate the benefit of our approach for general matrix multiplication operations over large matrices with up to four billion elements by using Gramian matrix computation that is commonly used in machine learning. Experiments show that our approach achieves more than 2× and 1.5× end-to-end performance speedups for dense and sparse matrices, respectively, and up to 57.04× faster computation compared to MLlib, a state of the art Spark-based implementation. This work is sponsored in part by the NSF under the grants: CNS-1565314, CNS-1405697, and CNS-1615411. The manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.},
doi = {10.1109/BigData.2017.8257935},
url = {https://www.osti.gov/biblio/1422792}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {12}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: