Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity
Abstract
In this paper, we consider a distributionally robust partially observable Markov decision process (DR-POMDP), where the distribution of the transition-observation probabilities is unknown at the beginning of each decision period, but their realizations can be inferred using side information at the end of each period after an action being taken. We build an ambiguity set of the joint distribution using bounded moments via conic constraints and seek an optimal policy to maximize the worst-case (minimum) reward for any distribution in the set. We show that the value function of DR-POMDP is piecewise linear convex with respect to the belief state and propose a heuristic search value iteration method for obtaining lower and upper bounds of the value function. We conduct numerical studies and demonstrate the computational performance of our approach via testing instances of a dynamic epidemic control problem. Our results show that DR-POMDP can produce more robust policies under misspecified distributions of transition-observation probabilities as compared to POMDP but has less costly solutions than robust POMDP. The DR-POMDP policies are also insensitive to varying parameter in the ambiguity set and to noise added to the true transition-observation probability values obtained at the end of each decision period.
- Authors:
-
- Univ. of Michigan, Ann Arbor, MI (United States)
- Publication Date:
- Research Org.:
- Univ. of Michigan, Ann Arbor, MI (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
- OSTI Identifier:
- 1785682
- Grant/Contract Number:
- SC0018018; CMMI-1727618
- Resource Type:
- Accepted Manuscript
- Journal Name:
- SIAM Journal on Optimization
- Additional Journal Information:
- Journal Volume: 31; Journal Issue: 1; Related Information: Hideaki Nakao, Ruiwei Jiang, Siqian Shen, “Distributionally robust Partially Observable Markov Decision Process with moment-based ambiguity,” SIAM Journal on Optimization (SIOPT), 31(1), 461–488, 2021.; Journal ID: ISSN 1052-6234
- Publisher:
- SIAM
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; partially observable Markov decision process; POMDP; distributionally robust optimization; moment-based ambiguity set; heuristic search value iteration; HSVI; epidemic control
Citation Formats
Nakao, Hideaki, Jiang, Ruiwei, and Shen, Siqian. Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity. United States: N. p., 2021.
Web. doi:10.1137/19m1268410.
Nakao, Hideaki, Jiang, Ruiwei, & Shen, Siqian. Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity. United States. https://doi.org/10.1137/19m1268410
Nakao, Hideaki, Jiang, Ruiwei, and Shen, Siqian. Mon .
"Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity". United States. https://doi.org/10.1137/19m1268410. https://www.osti.gov/servlets/purl/1785682.
@article{osti_1785682,
title = {Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity},
author = {Nakao, Hideaki and Jiang, Ruiwei and Shen, Siqian},
abstractNote = {In this paper, we consider a distributionally robust partially observable Markov decision process (DR-POMDP), where the distribution of the transition-observation probabilities is unknown at the beginning of each decision period, but their realizations can be inferred using side information at the end of each period after an action being taken. We build an ambiguity set of the joint distribution using bounded moments via conic constraints and seek an optimal policy to maximize the worst-case (minimum) reward for any distribution in the set. We show that the value function of DR-POMDP is piecewise linear convex with respect to the belief state and propose a heuristic search value iteration method for obtaining lower and upper bounds of the value function. We conduct numerical studies and demonstrate the computational performance of our approach via testing instances of a dynamic epidemic control problem. Our results show that DR-POMDP can produce more robust policies under misspecified distributions of transition-observation probabilities as compared to POMDP but has less costly solutions than robust POMDP. The DR-POMDP policies are also insensitive to varying parameter in the ambiguity set and to noise added to the true transition-observation probability values obtained at the end of each decision period.},
doi = {10.1137/19m1268410},
journal = {SIAM Journal on Optimization},
number = 1,
volume = 31,
place = {United States},
year = {Mon Feb 01 00:00:00 EST 2021},
month = {Mon Feb 01 00:00:00 EST 2021}
}
Works referenced in this record:
Perturbation and stability theory for Markov control problems
journal, January 1992
- Abbad, M.; Filar, J. A.
- IEEE Transactions on Automatic Control, Vol. 37, Issue 9
Robust Solutions of Optimization Problems Affected by Uncertain Probabilities
journal, February 2013
- Ben-Tal, Aharon; den Hertog, Dick; De Waegenaere, Anja
- Management Science, Vol. 59, Issue 2
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
journal, February 2010
- Delage, Erick; Mannor, Shie
- Operations Research, Vol. 58, Issue 1
Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems
journal, June 2010
- Delage, Erick; Ye, Yinyu
- Operations Research, Vol. 58, Issue 3
Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations
journal, July 2017
- Mohajerin Esfahani, Peyman; Kuhn, Daniel
- Mathematical Programming, Vol. 171, Issue 1-2
A new distribution-free quantile estimator
journal, January 1982
- Harrell, Frank E.; Davis, C. E.
- Biometrika, Vol. 69, Issue 3
Planning treatment of ischemic heart disease with partially observable Markov decision processes
journal, March 2000
- Hauskrecht, Milos; Fraser, Hamish
- Artificial Intelligence in Medicine, Vol. 18, Issue 3
The Mathematics of Infectious Diseases
journal, January 2000
- Hethcote, Herbert W.
- SIAM Review, Vol. 42, Issue 4
Robust Dynamic Programming
journal, May 2005
- Iyengar, Garud N.
- Mathematics of Operations Research, Vol. 30, Issue 2
Data-driven chance constrained stochastic program
journal, July 2015
- Jiang, Ruiwei; Guan, Yongpei
- Mathematical Programming, Vol. 158, Issue 1-2
Monitoring epidemiologic surveillance data using hidden Markov models
journal, December 1999
- Le Strat, Yann; Carrat, Fabrice
- Statistics in Medicine, Vol. 18, Issue 24
Robust MDPs with k -Rectangular Uncertainty
journal, November 2016
- Mannor, Shie; Mebel, Ofir; Xu, Huan
- Mathematics of Operations Research, Vol. 41, Issue 4
Robust Control of Markov Decision Processes with Uncertain Transition Matrices
journal, October 2005
- Nilim, Arnab; El Ghaoui, Laurent
- Operations Research, Vol. 53, Issue 5
The Optimal Control of Partially Observable Markov Processes over a Finite Horizon
journal, October 1973
- Smallwood, Richard D.; Sondik, Edward J.
- Operations Research, Vol. 21, Issue 5
Adaptive Inventory Control for Nonstationary Demand and Partial Information
journal, May 2002
- Treharne, James T.; Sox, Charles R.
- Management Science, Vol. 48, Issue 5
Robust Markov Decision Processes
journal, February 2013
- Wiesemann, Wolfram; Kuhn, Daniel; Rustem, Berç
- Mathematics of Operations Research, Vol. 38, Issue 1
Distributionally Robust Convex Optimization
journal, December 2014
- Wiesemann, Wolfram; Kuhn, Daniel; Sim, Melvyn
- Operations Research, Vol. 62, Issue 6
Distributionally Robust Markov Decision Processes
journal, May 2012
- Xu, Huan; Mannor, Shie
- Mathematics of Operations Research, Vol. 37, Issue 2
A Convex Optimization Approach to Distributionally Robust Markov Decision Processes With Wasserstein Distance
journal, July 2017
- Yang, Insoon
- IEEE Control Systems Letters, Vol. 1, Issue 1
Distributionally Robust Counterpart in Markov Decision Processes
journal, September 2016
- Yu, Pengqian; Xu, Huan
- IEEE Transactions on Automatic Control, Vol. 61, Issue 9
Distributionally robust joint chance constraints with second-order moment information
journal, November 2011
- Zymler, Steve; Kuhn, Daniel; Rustem, Berç
- Mathematical Programming, Vol. 137, Issue 1-2
Algorithms for singularly perturbed limiting average Markov control problems
conference, January 1990
- Abbad, M.; Filar, J. A.; Bielecki, T. R.
- 29th IEEE Conference on Decision and Control
Adaptive Inventory Control for Nonstationary Demand and Partial Information
journal, May 2002
- Treharne, James T.; Sox, Charles R.
- Management Science, Vol. 48, Issue 5
Distributionally Robust Convex Optimization
journal, December 2014
- Wiesemann, Wolfram; Kuhn, Daniel; Sim, Melvyn
- Operations Research, Vol. 62, Issue 6