Infinite horizon discounted dynamic programming subject to total variation ambiguity on conditional distribution
Paper i proceeding, 2016
We analyze the infinite horizon minimax discounted cost Markov Control Model (MCM), for a class of controlled process conditional distributions, which belong to a ball, with respect to total variation distance metric, centered at a known nominal controlled conditional distribution with radius R ϵ [0, 2], in which the minimization is over the control strategies and the maximization is over conditional distributions. Through our analysis (i) we derive a new discounted dynamic programming equation, (ii) we show the associated contraction property, and (iii) we develop a new policy iteration algorithm. Finally, the application of the new dynamic programming and the corresponding policy iteration algorithm are shown via an illustrative example.