skip to main content

DOE PAGESDOE PAGES

Title: Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data

Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression, also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.
Authors:
 [1]
  1. Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Dept. of Urban Studies & Planning
Publication Date:
Grant/Contract Number:
EE0004261
Type:
Published Article
Journal Name:
Applied Energy
Additional Journal Information:
Journal Volume: 160; Journal ID: ISSN 0306-2619
Publisher:
Elsevier
Research Org:
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
Sponsoring Org:
USDOE Office of Energy Efficiency and Renewable Energy (EERE)
Country of Publication:
United States
Language:
English
Subject:
32 ENERGY CONSERVATION, CONSUMPTION, AND UTILIZATION; 97 MATHEMATICS AND COMPUTING; cluster-wise regression; buildings; energy consumption; prediction accuracy; cluster stability; latent class regression
OSTI Identifier:
1250054
Alternate Identifier(s):
OSTI ID: 1437635

Hsu, David. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data. United States: N. p., Web. doi:10.1016/j.apenergy.2015.08.126.
Hsu, David. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data. United States. doi:10.1016/j.apenergy.2015.08.126.
Hsu, David. 2015. "Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data". United States. doi:10.1016/j.apenergy.2015.08.126.
@article{osti_1250054,
title = {Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data},
author = {Hsu, David},
abstractNote = {Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression, also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.},
doi = {10.1016/j.apenergy.2015.08.126},
journal = {Applied Energy},
number = ,
volume = 160,
place = {United States},
year = {2015},
month = {9}
}