DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques

Abstract

Optimization of simulation-based or data-driven systems is a challenging task, which has attracted significant attention in the recent literature. A very efficient approach for optimizing systems without analytical expressions is through fitting surrogate models. Due to their increased flexibility, nonlinear interpolating functions, such as radial basis functions and Kriging, have been predominantly used as surrogates for data-driven optimization; however, these methods lead to complex nonconvex formulations. Alternatively, commonly used regression-based surrogates lead to simpler formulations, but they are less flexible and inaccurate if the form is not known a priori. In this work, we investigate the efficiency of subset selection regression techniques for developing surrogate functions that balance both accuracy and complexity. Subset selection creates sparse regression models by selecting only a subset of original features, which are linearly combined to generate a diverse set of surrogate models. Five different subset selection techniques are compared with commonly used nonlinear interpolating surrogate functions with respect to optimization solution accuracy, computation time, sampling requirements, and model sparsity. Furthermore, our results indicate that subset selection-based regression functions exhibit promising performance when the dimensionality is low, while interpolation performs better for higher dimensional problems.

Authors:
ORCiD logo [1]; ORCiD logo [1]
  1. Georgia Inst. of Technology, Atlanta, GA (United States)
Publication Date:
Research Org.:
RAPID Manufacturing Institute, New York, NY (United States)
Sponsoring Org.:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Energy Efficiency Office. Advanced Manufacturing Office
OSTI Identifier:
1642435
Grant/Contract Number:  
EE0007888
Resource Type:
Accepted Manuscript
Journal Name:
Optimization Letters
Additional Journal Information:
Journal Volume: 14; Journal Issue: 4; Journal ID: ISSN 1862-4472
Publisher:
Springer Nature
Country of Publication:
United States
Language:
English
Subject:
42 ENGINEERING; Machine Learning; Surrogate modeling; Black-box optimization; Data-driven optimization; Subset selection for regression

Citation Formats

Kim, Sun Hye, and Boukouvala, Fani. Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques. United States: N. p., 2019. Web. doi:10.1007/s11590-019-01428-7.
Kim, Sun Hye, & Boukouvala, Fani. Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques. United States. https://doi.org/10.1007/s11590-019-01428-7
Kim, Sun Hye, and Boukouvala, Fani. Thu . "Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques". United States. https://doi.org/10.1007/s11590-019-01428-7. https://www.osti.gov/servlets/purl/1642435.
@article{osti_1642435,
title = {Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques},
author = {Kim, Sun Hye and Boukouvala, Fani},
abstractNote = {Optimization of simulation-based or data-driven systems is a challenging task, which has attracted significant attention in the recent literature. A very efficient approach for optimizing systems without analytical expressions is through fitting surrogate models. Due to their increased flexibility, nonlinear interpolating functions, such as radial basis functions and Kriging, have been predominantly used as surrogates for data-driven optimization; however, these methods lead to complex nonconvex formulations. Alternatively, commonly used regression-based surrogates lead to simpler formulations, but they are less flexible and inaccurate if the form is not known a priori. In this work, we investigate the efficiency of subset selection regression techniques for developing surrogate functions that balance both accuracy and complexity. Subset selection creates sparse regression models by selecting only a subset of original features, which are linearly combined to generate a diverse set of surrogate models. Five different subset selection techniques are compared with commonly used nonlinear interpolating surrogate functions with respect to optimization solution accuracy, computation time, sampling requirements, and model sparsity. Furthermore, our results indicate that subset selection-based regression functions exhibit promising performance when the dimensionality is low, while interpolation performs better for higher dimensional problems.},
doi = {10.1007/s11590-019-01428-7},
journal = {Optimization Letters},
number = 4,
volume = 14,
place = {United States},
year = {Thu May 09 00:00:00 EDT 2019},
month = {Thu May 09 00:00:00 EDT 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 40 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: (a) Graphical representation of test problem 𝑓(𝑥) = 2𝑥4 − 3𝑥2 + 𝑥, and (b) the resulting functional forms of surrogate models fitted by linear, Kriging, and SSR

Save / Share:

Works referenced in this record:

Metamodeling Approach to Optimization of Steady-State Flowsheet Simulations
journal, October 2002


Stable signal recovery from incomplete and inaccurate measurements
journal, January 2006

  • Candès, Emmanuel J.; Romberg, Justin K.; Tao, Terence
  • Communications on Pure and Applied Mathematics, Vol. 59, Issue 8, p. 1207-1223
  • DOI: 10.1002/cpa.20124

Gene Selection for Cancer Classification using Support Vector Machines
journal, January 2002

  • Guyon, Isabelle; Weston, Jason; Barnhill, Stephen
  • Machine Learning, Vol. 46, Issue 1/3, p. 389-422
  • DOI: 10.1023/A:1012487302797

Derivative-free optimization: a review of algorithms and comparison of software implementations
journal, July 2012


Learning surrogate models for simulation-based optimization
journal, March 2014

  • Cozad, Alison; Sahinidis, Nikolaos V.; Miller, David C.
  • AIChE Journal, Vol. 60, Issue 6
  • DOI: 10.1002/aic.14418

Optimization formulations for multi-product supply chain networks
journal, September 2017


Dynamic Data-Driven Modeling of Pharmaceutical Processes
journal, June 2011

  • Boukouvala, F.; Muzzio, F. J.; Ierapetritou, Marianthi G.
  • Industrial & Engineering Chemistry Research, Vol. 50, Issue 11
  • DOI: 10.1021/ie102305a

A tutorial on support vector regression
journal, August 2004


A polyhedral branch-and-cut approach to global optimization
journal, May 2005


Sparse principal component regression with adaptive loading
journal, September 2015

  • Kawano, Shuichi; Fujisawa, Hironori; Takada, Toyoyuki
  • Computational Statistics & Data Analysis, Vol. 89
  • DOI: 10.1016/j.csda.2015.03.016

Partial least-squares regression: a tutorial
journal, January 1986


Sparse Principal Component Analysis
journal, June 2006

  • Zou, Hui; Hastie, Trevor; Tibshirani, Robert
  • Journal of Computational and Graphical Statistics, Vol. 15, Issue 2
  • DOI: 10.1198/106186006X113430

ARGONAUT: AlgoRithms for Global Optimization of coNstrAined grey-box compUTational problems
journal, April 2016


Feature subset selection using naive Bayes for text classification
journal, November 2015


Protein structure prediction by global optimization of a potential energy function
journal, May 1999

  • Liwo, A.; Lee, J.; Ripoll, D. R.
  • Proceedings of the National Academy of Sciences, Vol. 96, Issue 10
  • DOI: 10.1073/pnas.96.10.5482

Regularization and variable selection via the elastic net
journal, April 2005


Practical selection of SVM parameters and noise estimation for SVM regression
journal, January 2004


Robust Face Recognition via Sparse Representation
journal, February 2009

  • Wright, J.; Yang, A. Y.; Ganesh, A.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, Issue 2
  • DOI: 10.1109/TPAMI.2008.79

Assessing a Response Surface-Based Optimization Approach for Soil Vapor Extraction System Design
journal, May 2009


Efficient Optimization Design Method Using Kriging Model
journal, March 2005

  • Jeong, Shinkyu; Murayama, Mitsuhiro; Yamamoto, Kazuomi
  • Journal of Aircraft, Vol. 42, Issue 2
  • DOI: 10.2514/1.6386

Efficient Global Optimization of Expensive Black-Box Functions
journal, January 1998

  • Jones, Donald R.; Schonlau, Matthias; Welch, William J.
  • Journal of Global Optimization, Vol. 13, Issue 4, p. 455-492
  • DOI: 10.1023/A:1008306431147

Simulation optimization: A comprehensive review on theory and applications
journal, November 2004


A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point
journal, August 2018


Recent advances in surrogate-based optimization
journal, January 2009


A method for simulation based optimization using radial basis functions
journal, June 2009

  • Jakobsson, Stefan; Patriksson, Michael; Rudholm, Johan
  • Optimization and Engineering, Vol. 11, Issue 4
  • DOI: 10.1007/s11081-009-9087-1

Advances in surrogate based modeling, feasibility analysis, and optimization: A review
journal, January 2018


Optimization of a small-scale LNG supply chain
journal, April 2018


Global optimization of grey-box computational systems using surrogate functions and application to highly constrained oil-field operations
journal, June 2018


Use of reduced-order models in well control optimization
journal, February 2016


Sparse partial least squares regression for simultaneous dimension reduction and variable selection
journal, January 2010

  • Chun, Hyonho; Keleş, Sündüz
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 72, Issue 1
  • DOI: 10.1111/j.1467-9868.2009.00723.x

A combined first-principles and data-driven approach to model building
journal, February 2015


A derivative-free methodology with local and global search for the constrained joint optimization of well locations and controls
journal, November 2013

  • Isebor, Obiajulu J.; Durlofsky, Louis J.; Echeverría Ciaurri, David
  • Computational Geosciences, Vol. 18, Issue 3-4
  • DOI: 10.1007/s10596-013-9383-x

An evaluation of adaptive surrogate modeling based optimization with two benchmark problems
journal, October 2014


Constrained Global Optimization of Expensive Black Box Functions Using Radial Basis Functions
journal, January 2005


Global optimization of general constrained grey-box models: new method and its application to constrained PDEs for pressure swing adsorption
journal, November 2015

  • Boukouvala, Fani; Hasan, M. M. Faruque; Floudas, Christodoulos A.
  • Journal of Global Optimization, Vol. 67, Issue 1-2
  • DOI: 10.1007/s10898-015-0376-2

Simulation optimization: a review of algorithms and applications
journal, November 2014


Selection of Subsets of Regression Variables
journal, January 1984

  • Miller, Alan J.
  • Journal of the Royal Statistical Society. Series A (General), Vol. 147, Issue 3
  • DOI: 10.2307/2981576

A Taxonomy of Global Optimization Methods Based on Response Surfaces
journal, December 2001


Improved molecular replacement by density- and energy-guided protein structure optimization
journal, May 2011

  • DiMaio, Frank; Terwilliger, Thomas C.; Read, Randy J.
  • Nature, Vol. 473, Issue 7348
  • DOI: 10.1038/nature09964

Modeling and Optimization of a Pharmaceutical Formulation System Using Radial Basis Function Network
journal, April 2009

  • Anand, P.; Siva Prasad, B. V. N.; Venkateswarlu, Ch.
  • International Journal of Neural Systems, Vol. 19, Issue 02
  • DOI: 10.1142/S0129065709001896

Deep Representational Similarity Learning for Analyzing Neural Signatures in Task-based fMRI Dataset
journal, October 2020

  • Yousefnezhad, Muhammad; Sawalha, Jeffrey; Selvitella, Alessandro
  • Neuroinformatics, Vol. 19, Issue 3
  • DOI: 10.1007/s12021-020-09494-4

Efficient Optimization Design Method Using Kriging Model
journal, September 2005

  • Jeong, Shinkyu; Murayama, Mitsuhiro; Yamamoto, Kazuomi
  • Journal of Aircraft, Vol. 42, Issue 5
  • DOI: 10.2514/1.17383

Simulation optimization: a review of algorithms and applications
journal, September 2015

  • Amaran, Satyajith; Sahinidis, Nikolaos V.; Sharda, Bikram
  • Annals of Operations Research, Vol. 240, Issue 1
  • DOI: 10.1007/s10479-015-2019-x

Efficient Optimization Design Method Using Kriging Model
journal, September 2005

  • Jeong, Shinkyu; Murayama, Mitsuhiro; Yamamoto, Kazuomi
  • Journal of Aircraft, Vol. 42, Issue 5
  • DOI: 10.2514/1.c10485e

Efficient Optimization Design Method Using Kriging Model
conference, June 2004

  • Jeong, Shinkyu; Murayama, Mitsuhiro; Yamamoto, Kazuomi
  • 42nd AIAA Aerospace Sciences Meeting and Exhibit
  • DOI: 10.2514/6.2004-118

Sparse principal component regression with adaptive loading
text, January 2014


Simulation optimization: A review of algorithms and applications
text, January 2017