skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Conditions for the Solvability of the Linear Programming Formulation for Constrained Discounted Markov Decision Processes

Abstract

We consider a discrete-time constrained discounted Markov decision process (MDP) with Borel state and action spaces, compact action sets, and lower semi-continuous cost functions. We introduce a set of hypotheses related to a positive weight function which allow us to consider cost functions that might not be bounded below by a constant, and which imply the solvability of the linear programming formulation of the constrained MDP. In particular, we establish the existence of a constrained optimal stationary policy. Our results are illustrated with an application to a fishery management problem.

Authors:
 [1];  [2]
  1. Institut de Mathématiques de Bordeaux, INRIA Bordeaux Sud Ouest, Team: CQFD, and IMB (France)
  2. UNED, Department of Statistics and Operations Research (Spain)
Publication Date:
OSTI Identifier:
22617268
Resource Type:
Journal Article
Resource Relation:
Journal Name: Applied Mathematics and Optimization; Journal Volume: 74; Journal Issue: 1; Other Information: Copyright (c) 2016 Springer Science+Business Media New York; http://www.springer-ny.com; Country of input: International Atomic Energy Agency (IAEA)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; COMPACTS; FUNCTIONS; HYPOTHESIS; LINEAR PROGRAMMING; MANAGEMENT; MARKOV PROCESS

Citation Formats

Dufour, F., E-mail: dufour@math.u-bordeaux1.fr, and Prieto-Rumeau, T., E-mail: tprieto@ccia.uned.es. Conditions for the Solvability of the Linear Programming Formulation for Constrained Discounted Markov Decision Processes. United States: N. p., 2016. Web. doi:10.1007/S00245-015-9307-3.
Dufour, F., E-mail: dufour@math.u-bordeaux1.fr, & Prieto-Rumeau, T., E-mail: tprieto@ccia.uned.es. Conditions for the Solvability of the Linear Programming Formulation for Constrained Discounted Markov Decision Processes. United States. doi:10.1007/S00245-015-9307-3.
Dufour, F., E-mail: dufour@math.u-bordeaux1.fr, and Prieto-Rumeau, T., E-mail: tprieto@ccia.uned.es. 2016. "Conditions for the Solvability of the Linear Programming Formulation for Constrained Discounted Markov Decision Processes". United States. doi:10.1007/S00245-015-9307-3.
@article{osti_22617268,
title = {Conditions for the Solvability of the Linear Programming Formulation for Constrained Discounted Markov Decision Processes},
author = {Dufour, F., E-mail: dufour@math.u-bordeaux1.fr and Prieto-Rumeau, T., E-mail: tprieto@ccia.uned.es},
abstractNote = {We consider a discrete-time constrained discounted Markov decision process (MDP) with Borel state and action spaces, compact action sets, and lower semi-continuous cost functions. We introduce a set of hypotheses related to a positive weight function which allow us to consider cost functions that might not be bounded below by a constant, and which imply the solvability of the linear programming formulation of the constrained MDP. In particular, we establish the existence of a constrained optimal stationary policy. Our results are illustrated with an application to a fishery management problem.},
doi = {10.1007/S00245-015-9307-3},
journal = {Applied Mathematics and Optimization},
number = 1,
volume = 74,
place = {United States},
year = 2016,
month = 8
}
  • In this paper, we investigate an optimization problem for continuous-time Markov decision processes with both impulsive and continuous controls. We consider the so-called constrained problem where the objective of the controller is to minimize a total expected discounted optimality criterion associated with a cost rate function while keeping other performance criteria of the same form, but associated with different cost rate functions, below some given bounds. Our model allows multiple impulses at the same time moment. The main objective of this work is to study the associated linear program defined on a space of measures including the occupation measures ofmore » the controlled process and to provide sufficient conditions to ensure the existence of an optimal control.« less
  • This note concerns discrete-time controlled Markov chains with Borel state and action spaces. Given a nonnegative cost function, the performance of a control policy is measured by the superior limit risk-sensitive average criterion associated with a constant and positive risk sensitivity coefficient. Within such a framework, the discounted approach is used (a) to establish the existence of solutions for the corresponding optimality inequality, and (b) to show that, under mild conditions on the cost function, the optimal value functions corresponding to the superior and inferior limit average criteria coincide on a certain subset of the state space. The approach ofmore » the paper relies on standard dynamic programming ideas and on a simple analytical derivation of a Tauberian relation.« less
  • This paper deals with the expected discounted continuous control of piecewise deterministic Markov processes (PDMP's) using a singular perturbation approach for dealing with rapidly oscillating parameters. The state space of the PDMP is written as the product of a finite set and a subset of the Euclidean space Double-Struck-Capital-R {sup n}. The discrete part of the state, called the regime, characterizes the mode of operation of the physical system under consideration, and is supposed to have a fast (associated to a small parameter {epsilon}>0) and a slow behavior. By using a similar approach as developed in Yin and Zhang (Continuous-Timemore » Markov Chains and Applications: A Singular Perturbation Approach, Applications of Mathematics, vol. 37, Springer, New York, 1998, Chaps. 1 and 3) the idea in this paper is to reduce the number of regimes by considering an averaged model in which the regimes within the same class are aggregated through the quasi-stationary distribution so that the different states in this class are replaced by a single one. The main goal is to show that the value function of the control problem for the system driven by the perturbed Markov chain converges to the value function of this limit control problem as {epsilon} goes to zero. This convergence is obtained by, roughly speaking, showing that the infimum and supremum limits of the value functions satisfy two optimality inequalities as {epsilon} goes to zero. This enables us to show the result by invoking a uniqueness argument, without needing any kind of Lipschitz continuity condition.« less
  • This work develops asymptotically optimal controls for discrete-time singularly perturbed Markov decision processes (MDPs) having weak and strong interactions. The focus is on finite-state-space-MDP problems. The state space of the underlying Markov chain can be decomposed into a number of recurrent classes or a number of recurrent classes and a group of transient states. Using a hierarchical control approach, continuous-time limit problems that are much simpler to handle than the original ones are derived. Based on the optimal solutions for the limit problems, nearly optimal decisions for the original problems are obtained. The asymptotic optimality of such controls is provedmore » and the rate of convergence is provided. Infinite horizon problems are considered; both discounted costs and long-run average costs are examined.« less
  • In this paper we discuss MDP with distribution function criterion of first-passage time. Some properties of several kinds of optimal policies are given. Existence results and algorithms for these optimal policies are given in this paper.