skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques

Abstract

Optimization of simulation-based or data-driven systems is a challenging task, which has attracted significant attention in the recent literature. A very efficient approach for optimizing systems without analytical expressions is through fitting surrogate models. Due to their increased flexibility, nonlinear interpolating functions, such as radial basis functions and Kriging, have been predominantly used as surrogates for data-driven optimization; however, these methods lead to complex nonconvex formulations. Alternatively, commonly used regression-based surrogates lead to simpler formulations, but they are less flexible and inaccurate if the form is not known a priori. In this work, we investigate the efficiency of subset selection regression techniques for developing surrogate functions that balance both accuracy and complexity. Subset selection creates sparse regression models by selecting only a subset of original features, which are linearly combined to generate a diverse set of surrogate models. Five different subset selection techniques are compared with commonly used nonlinear interpolating surrogate functions with respect to optimization solution accuracy, computation time, sampling requirements, and model sparsity. Furthermore, our results indicate that subset selection-based regression functions exhibit promising performance when the dimensionality is low, while interpolation performs better for higher dimensional problems.

Authors:
ORCiD logo [1]; ORCiD logo [1]
  1. Georgia Inst. of Technology, Atlanta, GA (United States)
Publication Date:
Research Org.:
RAPID Manufacturing Institute, New York, NY (United States)
Sponsoring Org.:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Energy Efficiency Office. Advanced Manufacturing Office
OSTI Identifier:
1642435
Grant/Contract Number:  
EE0007888
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Optimization Letters
Additional Journal Information:
Journal Volume: 14; Journal Issue: 4; Journal ID: ISSN 1862-4472
Publisher:
Springer Nature
Country of Publication:
United States
Language:
English
Subject:
42 ENGINEERING; Machine Learning; Surrogate modeling; Black-box optimization; Data-driven optimization; Subset selection for regression

Citation Formats

Kim, Sun Hye, and Boukouvala, Fani. Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques. United States: N. p., 2019. Web. doi:10.1007/s11590-019-01428-7.
Kim, Sun Hye, & Boukouvala, Fani. Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques. United States. https://doi.org/10.1007/s11590-019-01428-7
Kim, Sun Hye, and Boukouvala, Fani. 2019. "Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques". United States. https://doi.org/10.1007/s11590-019-01428-7. https://www.osti.gov/servlets/purl/1642435.
@article{osti_1642435,
title = {Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques},
author = {Kim, Sun Hye and Boukouvala, Fani},
abstractNote = {Optimization of simulation-based or data-driven systems is a challenging task, which has attracted significant attention in the recent literature. A very efficient approach for optimizing systems without analytical expressions is through fitting surrogate models. Due to their increased flexibility, nonlinear interpolating functions, such as radial basis functions and Kriging, have been predominantly used as surrogates for data-driven optimization; however, these methods lead to complex nonconvex formulations. Alternatively, commonly used regression-based surrogates lead to simpler formulations, but they are less flexible and inaccurate if the form is not known a priori. In this work, we investigate the efficiency of subset selection regression techniques for developing surrogate functions that balance both accuracy and complexity. Subset selection creates sparse regression models by selecting only a subset of original features, which are linearly combined to generate a diverse set of surrogate models. Five different subset selection techniques are compared with commonly used nonlinear interpolating surrogate functions with respect to optimization solution accuracy, computation time, sampling requirements, and model sparsity. Furthermore, our results indicate that subset selection-based regression functions exhibit promising performance when the dimensionality is low, while interpolation performs better for higher dimensional problems.},
doi = {10.1007/s11590-019-01428-7},
url = {https://www.osti.gov/biblio/1642435}, journal = {Optimization Letters},
issn = {1862-4472},
number = 4,
volume = 14,
place = {United States},
year = {2019},
month = {5}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Figures / Tables:

Figure 1 Figure 1: (a) Graphical representation of test problem 𝑓(𝑥) = 2𝑥4 − 3𝑥2 + 𝑥, and (b) the resulting functional forms of surrogate models fitted by linear, Kriging, and SSR

Save / Share:

Works referenced in this record:

Metamodeling Approach to Optimization of Steady-State Flowsheet Simulations
journal, October 2002


Stable signal recovery from incomplete and inaccurate measurements
journal, January 2006

  • Candès, Emmanuel J.; Romberg, Justin K.; Tao, Terence
  • Communications on Pure and Applied Mathematics, Vol. 59, Issue 8, p. 1207-1223
  • https://doi.org/10.1002/cpa.20124

Derivative-free optimization: a review of algorithms and comparison of software implementations
journal, July 2012


Learning surrogate models for simulation-based optimization
journal, March 2014


Optimization formulations for multi-product supply chain networks
journal, September 2017


Dynamic Data-Driven Modeling of Pharmaceutical Processes
journal, June 2011


A tutorial on support vector regression
journal, August 2004


A polyhedral branch-and-cut approach to global optimization
journal, May 2005


Sparse principal component regression with adaptive loading
journal, September 2015


Partial least-squares regression: a tutorial
journal, January 1986


Sparse Principal Component Analysis
journal, June 2006


ARGONAUT: AlgoRithms for Global Optimization of coNstrAined grey-box compUTational problems
journal, April 2016


Feature subset selection using naive Bayes for text classification
journal, November 2015


Protein structure prediction by global optimization of a potential energy function
journal, May 1999


Regularization and variable selection via the elastic net
journal, April 2005


Practical selection of SVM parameters and noise estimation for SVM regression
journal, January 2004


Robust Face Recognition via Sparse Representation
journal, February 2009


Assessing a Response Surface-Based Optimization Approach for Soil Vapor Extraction System Design
journal, May 2009


Efficient Optimization Design Method Using Kriging Model
journal, March 2005


Simulation optimization: A comprehensive review on theory and applications
journal, November 2004


A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point
journal, August 2018


Recent advances in surrogate-based optimization
journal, January 2009


A method for simulation based optimization using radial basis functions
journal, June 2009


Advances in surrogate based modeling, feasibility analysis, and optimization: A review
journal, January 2018


Optimization of a small-scale LNG supply chain
journal, April 2018


Global optimization of grey-box computational systems using surrogate functions and application to highly constrained oil-field operations
journal, June 2018


Use of reduced-order models in well control optimization
journal, February 2016


Sparse partial least squares regression for simultaneous dimension reduction and variable selection
journal, January 2010


A combined first-principles and data-driven approach to model building
journal, February 2015


A derivative-free methodology with local and global search for the constrained joint optimization of well locations and controls
journal, November 2013


An evaluation of adaptive surrogate modeling based optimization with two benchmark problems
journal, October 2014


Constrained Global Optimization of Expensive Black Box Functions Using Radial Basis Functions
journal, January 2005


Global optimization of general constrained grey-box models: new method and its application to constrained PDEs for pressure swing adsorption
journal, November 2015


Simulation optimization: a review of algorithms and applications
journal, November 2014


Selection of Subsets of Regression Variables
journal, January 1984


Improved molecular replacement by density- and energy-guided protein structure optimization
journal, May 2011


Modeling and Optimization of a Pharmaceutical Formulation System Using Radial Basis Function Network
journal, April 2009


Deep Representational Similarity Learning for Analyzing Neural Signatures in Task-based fMRI Dataset
journal, October 2020


Efficient Optimization Design Method Using Kriging Model
journal, September 2005


Simulation optimization: a review of algorithms and applications
journal, September 2015


Efficient Optimization Design Method Using Kriging Model
journal, September 2005


Efficient Optimization Design Method Using Kriging Model
conference, June 2004


Sparse principal component regression with adaptive loading
text, January 2014


Simulation optimization: A review of algorithms and applications
text, January 2017