skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Regression Tree Learning on Hadoop using OpenPlanet

Abstract

As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework using a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.

Authors:
; ;
Publication Date:
Research Org.:
City of Los Angeles Department
Sponsoring Org.:
USDOE Office of Electricity Delivery and Energy Reliability (OE)
OSTI Identifier:
1332538
Report Number(s):
DOE-USC-00192-101
DOE Contract Number:  
OE0000192
Resource Type:
Conference
Resource Relation:
Conference: International Workshop on MapReduce and its Applications, Delft, the Netherlands June 18- 19, 2012
Country of Publication:
United States
Language:
English

Citation Formats

Yin, Wei, Simmhan, Yogesh, and Prasanna, Viktor. Scalable Regression Tree Learning on Hadoop using OpenPlanet. United States: N. p., 2012. Web. doi:10.1145/2287016.2287027.
Yin, Wei, Simmhan, Yogesh, & Prasanna, Viktor. Scalable Regression Tree Learning on Hadoop using OpenPlanet. United States. doi:10.1145/2287016.2287027.
Yin, Wei, Simmhan, Yogesh, and Prasanna, Viktor. Mon . "Scalable Regression Tree Learning on Hadoop using OpenPlanet". United States. doi:10.1145/2287016.2287027. https://www.osti.gov/servlets/purl/1332538.
@article{osti_1332538,
title = {Scalable Regression Tree Learning on Hadoop using OpenPlanet},
author = {Yin, Wei and Simmhan, Yogesh and Prasanna, Viktor},
abstractNote = {As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework using a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.},
doi = {10.1145/2287016.2287027},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {6}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: