A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces
- PDF / 684,067 Bytes
- 25 Pages / 439.37 x 666.142 pts Page_size
- 44 Downloads / 207 Views
A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces Insoon Yang1 Received: 15 January 2020 / Accepted: 3 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the optimal value of a convex program. The approximate Bellman operator has a computational advantage because it involves a convex optimization problem in the case of control-affine systems and convex costs. Using this feature, we propose a simple dynamic programming algorithm to evaluate the approximate value function at pre-specified grid points by solving convex optimization problems in each iteration. We show that the proposed method approximates the optimal value function with a uniform convergence property in the case of convex optimal value functions. We also propose an interpolation-free design method for a control policy, of which performance converges uniformly to the optimum as the grid resolution becomes finer. When a nonlinear control-affine system is considered, the convex optimization approach provides an approximate policy with a provable suboptimality bound. For general cases, the proposed convex formulation of dynamic programming operators can be modified as a nonconvex bilevel program, in which the inner problem is a linear program, without losing the uniform convergence properties. Keywords Dynamic programming · Convex optimization · Optimal control · Stochastic control Mathematics Subject Classification 90C39 · 49L20 · 90C25
Communicated by Lars Grüne.
B 1
Insoon Yang [email protected] Department of Electrical and Computer Engineering, Automation Systems Research Institute, Seoul National University, Seoul, South Korea
123
Journal of Optimization Theory and Applications
1 Introduction Dynamic programming (DP) has been one of the most important methods for solving and analyzing sequential decision-making problems in optimal control, dynamic games, and reinforcement learning, among others. Using DP, we can decompose a complicated sequential decision-making problem into multiple tractable subproblems, of which optimal solutions are used to construct an optimal policy of the original problem. Numerical methods for DP are well studied for discrete-time Markov decision processes (MDPs) with discrete state and action spaces (e.g., [1]) and continuous-time deterministic and stochastic optimal control problems in continuous state spaces (e.g., [2]). This paper focuses on the discrete-time case with continuous state and action spaces in the finite-horizon setting. Unlike infinite-dimensional linear programming (LP) methods (e.g., [3–5]), which require a finite-dimensional approximation of the LP problems, we develop a finite-dimensional convex optimization-based method, which uses a discretization of the state space, while not disc
Data Loading...