DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity

Abstract

In this paper, we consider a distributionally robust partially observable Markov decision process (DR-POMDP), where the distribution of the transition-observation probabilities is unknown at the beginning of each decision period, but their realizations can be inferred using side information at the end of each period after an action being taken. We build an ambiguity set of the joint distribution using bounded moments via conic constraints and seek an optimal policy to maximize the worst-case (minimum) reward for any distribution in the set. We show that the value function of DR-POMDP is piecewise linear convex with respect to the belief state and propose a heuristic search value iteration method for obtaining lower and upper bounds of the value function. We conduct numerical studies and demonstrate the computational performance of our approach via testing instances of a dynamic epidemic control problem. Our results show that DR-POMDP can produce more robust policies under misspecified distributions of transition-observation probabilities as compared to POMDP but has less costly solutions than robust POMDP. The DR-POMDP policies are also insensitive to varying parameter in the ambiguity set and to noise added to the true transition-observation probability values obtained at the end of each decision period.

Authors:
 [1]; ORCiD logo [1]; ORCiD logo [1]
  1. Univ. of Michigan, Ann Arbor, MI (United States)
Publication Date:
Research Org.:
Univ. of Michigan, Ann Arbor, MI (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
OSTI Identifier:
1785682
Grant/Contract Number:  
SC0018018; CMMI-1727618
Resource Type:
Accepted Manuscript
Journal Name:
SIAM Journal on Optimization
Additional Journal Information:
Journal Volume: 31; Journal Issue: 1; Related Information: Hideaki Nakao, Ruiwei Jiang, Siqian Shen, “Distributionally robust Partially Observable Markov Decision Process with moment-based ambiguity,” SIAM Journal on Optimization (SIOPT), 31(1), 461–488, 2021.; Journal ID: ISSN 1052-6234
Publisher:
SIAM
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; partially observable Markov decision process; POMDP; distributionally robust optimization; moment-based ambiguity set; heuristic search value iteration; HSVI; epidemic control

Citation Formats

Nakao, Hideaki, Jiang, Ruiwei, and Shen, Siqian. Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity. United States: N. p., 2021. Web. doi:10.1137/19m1268410.
Nakao, Hideaki, Jiang, Ruiwei, & Shen, Siqian. Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity. United States. https://doi.org/10.1137/19m1268410
Nakao, Hideaki, Jiang, Ruiwei, and Shen, Siqian. Mon . "Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity". United States. https://doi.org/10.1137/19m1268410. https://www.osti.gov/servlets/purl/1785682.
@article{osti_1785682,
title = {Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity},
author = {Nakao, Hideaki and Jiang, Ruiwei and Shen, Siqian},
abstractNote = {In this paper, we consider a distributionally robust partially observable Markov decision process (DR-POMDP), where the distribution of the transition-observation probabilities is unknown at the beginning of each decision period, but their realizations can be inferred using side information at the end of each period after an action being taken. We build an ambiguity set of the joint distribution using bounded moments via conic constraints and seek an optimal policy to maximize the worst-case (minimum) reward for any distribution in the set. We show that the value function of DR-POMDP is piecewise linear convex with respect to the belief state and propose a heuristic search value iteration method for obtaining lower and upper bounds of the value function. We conduct numerical studies and demonstrate the computational performance of our approach via testing instances of a dynamic epidemic control problem. Our results show that DR-POMDP can produce more robust policies under misspecified distributions of transition-observation probabilities as compared to POMDP but has less costly solutions than robust POMDP. The DR-POMDP policies are also insensitive to varying parameter in the ambiguity set and to noise added to the true transition-observation probability values obtained at the end of each decision period.},
doi = {10.1137/19m1268410},
journal = {SIAM Journal on Optimization},
number = 1,
volume = 31,
place = {United States},
year = {Mon Feb 01 00:00:00 EST 2021},
month = {Mon Feb 01 00:00:00 EST 2021}
}

Works referenced in this record:

Perturbation and stability theory for Markov control problems
journal, January 1992

  • Abbad, M.; Filar, J. A.
  • IEEE Transactions on Automatic Control, Vol. 37, Issue 9
  • DOI: 10.1109/9.159584

Robust Solutions of Optimization Problems Affected by Uncertain Probabilities
journal, February 2013

  • Ben-Tal, Aharon; den Hertog, Dick; De Waegenaere, Anja
  • Management Science, Vol. 59, Issue 2
  • DOI: 10.1287/mnsc.1120.1641

Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
journal, February 2010


Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems
journal, June 2010


Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations
journal, July 2017


A new distribution-free quantile estimator
journal, January 1982


Planning treatment of ischemic heart disease with partially observable Markov decision processes
journal, March 2000


The Mathematics of Infectious Diseases
journal, January 2000


Robust Dynamic Programming
journal, May 2005


Data-driven chance constrained stochastic program
journal, July 2015


Monitoring epidemiologic surveillance data using hidden Markov models
journal, December 1999


Robust MDPs with k -Rectangular Uncertainty
journal, November 2016

  • Mannor, Shie; Mebel, Ofir; Xu, Huan
  • Mathematics of Operations Research, Vol. 41, Issue 4
  • DOI: 10.1287/moor.2016.0786

Robust Control of Markov Decision Processes with Uncertain Transition Matrices
journal, October 2005


The Optimal Control of Partially Observable Markov Processes over a Finite Horizon
journal, October 1973


Adaptive Inventory Control for Nonstationary Demand and Partial Information
journal, May 2002


Robust Markov Decision Processes
journal, February 2013

  • Wiesemann, Wolfram; Kuhn, Daniel; Rustem, Berç
  • Mathematics of Operations Research, Vol. 38, Issue 1
  • DOI: 10.1287/moor.1120.0566

Distributionally Robust Convex Optimization
journal, December 2014

  • Wiesemann, Wolfram; Kuhn, Daniel; Sim, Melvyn
  • Operations Research, Vol. 62, Issue 6
  • DOI: 10.1287/opre.2014.1314

Distributionally Robust Markov Decision Processes
journal, May 2012


Distributionally Robust Counterpart in Markov Decision Processes
journal, September 2016


Distributionally robust joint chance constraints with second-order moment information
journal, November 2011


Algorithms for singularly perturbed limiting average Markov control problems
conference, January 1990

  • Abbad, M.; Filar, J. A.; Bielecki, T. R.
  • 29th IEEE Conference on Decision and Control
  • DOI: 10.1109/cdc.1990.203841

Adaptive Inventory Control for Nonstationary Demand and Partial Information
journal, May 2002


Distributionally Robust Convex Optimization
journal, December 2014

  • Wiesemann, Wolfram; Kuhn, Daniel; Sim, Melvyn
  • Operations Research, Vol. 62, Issue 6
  • DOI: 10.1287/opre.2014.1314