DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data

Abstract

Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression, also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.

Authors:
Publication Date:
Research Org.:
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
Sponsoring Org.:
USDOE Office of Energy Efficiency and Renewable Energy (EERE)
OSTI Identifier:
1250054
Alternate Identifier(s):
OSTI ID: 1437635
Grant/Contract Number:  
EE0004261
Resource Type:
Published Article
Journal Name:
Applied Energy
Additional Journal Information:
Journal Name: Applied Energy Journal Volume: 160 Journal Issue: C; Journal ID: ISSN 0306-2619
Publisher:
Elsevier
Country of Publication:
United Kingdom
Language:
English
Subject:
32 ENERGY CONSERVATION, CONSUMPTION, AND UTILIZATION; 97 MATHEMATICS AND COMPUTING; cluster-wise regression; buildings; energy consumption; prediction accuracy; cluster stability; latent class regression

Citation Formats

Hsu, David. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data. United Kingdom: N. p., 2015. Web. doi:10.1016/j.apenergy.2015.08.126.
Hsu, David. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data. United Kingdom. https://doi.org/10.1016/j.apenergy.2015.08.126
Hsu, David. Tue . "Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data". United Kingdom. https://doi.org/10.1016/j.apenergy.2015.08.126.
@article{osti_1250054,
title = {Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data},
author = {Hsu, David},
abstractNote = {Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression, also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.},
doi = {10.1016/j.apenergy.2015.08.126},
journal = {Applied Energy},
number = C,
volume = 160,
place = {United Kingdom},
year = {Tue Dec 01 00:00:00 EST 2015},
month = {Tue Dec 01 00:00:00 EST 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1016/j.apenergy.2015.08.126

Citation Metrics:
Cited by: 107 works
Citation information provided by
Web of Science

Save / Share:

Works referencing / citing this record:

Selecting Appropriate Clustering Methods for Materials Science Applications of Machine Learning
journal, September 2019

  • Parker, Amanda J.; Barnard, Amanda S.
  • Advanced Theory and Simulations, Vol. 2, Issue 12
  • DOI: 10.1002/adts.201900145

Energy Cost Burdens for Low-Income and Minority Households: Evidence From Energy Benchmarking and Audit Data in Five U.S. Cities
journal, September 2019

  • Kontokosta, Constantine E.; Reina, Vincent J.; Bonczak, Bartosz
  • Journal of the American Planning Association, Vol. 86, Issue 1
  • DOI: 10.1080/01944363.2019.1647446

Generalised clusterwise regression for simultaneous estimation of optimal pavement clusters and performance models
journal, September 2018

  • Khadka, Mukesh; Paz, Alexander; Singh, Ashok
  • International Journal of Pavement Engineering, Vol. 21, Issue 9
  • DOI: 10.1080/10298436.2018.1521970

Community energy by design: A simulation-based design workflow using measured data clustering to calibrate Urban Building Energy Models (UBEMs)
journal, January 2019

  • Rakha, Tarek; El Kontar, Rawad
  • Environment and Planning B: Urban Analytics and City Science, Vol. 46, Issue 8
  • DOI: 10.1177/2399808319841909

Building Energy Consumption Prediction: An Extreme Deep Learning Approach
journal, October 2017

  • Li, Chengdong; Ding, Zixiang; Zhao, Dongbin
  • Energies, Vol. 10, Issue 10
  • DOI: 10.3390/en10101525

Deep Belief Network Based Hybrid Model for Building Energy Consumption Prediction
journal, January 2018

  • Li, Chengdong; Ding, Zixiang; Yi, Jianqiang
  • Energies, Vol. 11, Issue 1
  • DOI: 10.3390/en11010242

Neural-Network-Based Building Energy Consumption Prediction with Training Data Generation
journal, October 2019

  • Lee, Sanghyuk; Cha, Jaehoon; Kim, Moon Keun
  • Processes, Vol. 7, Issue 10
  • DOI: 10.3390/pr7100731