Learning

This chapter begins by extending Chapter 7, in which controlled Markov chains were discussed. To choose admissible controls stabilizing these, we first give a Lyapunov criterion relating to the returns to a set for any admissible strategy.

  • PDF / 3,266,695 Bytes
  • 43 Pages / 439.37 x 666.142 pts Page_size
  • 33 Downloads / 221 Views

DOWNLOAD

REPORT


This chapter begins by extending Chapter 7, in which controlled Markov chains were discussed. To choose admissible controls stabilizing these, we first give a Lyapunov criterion relating to the returns to a set for any admissible strategy. We then discuss various stochastic algorithms. The ordinary differential equation method is weil adapted to the study of the convergence of such algorithms when the underlying model has good ergodic properties and when there exists a strongly attractive point. It then provides a useful complement to methods which are essentially based on the use of martingales, such as those applied earlier. However, as soon as the algorithm includes 'traps' this method is primarily useful as a guide to the search for possible limits; technical recourse to martingale tools remains valuable. Without pretending to present a general theory, we give a number of typical examples. 9.1 Controlled Markov Chains 9.1.1 Markov Property 9.1.2 Return Times and Recurrence 9.1.3 Average Costs 9.2 Stochastic Algorithms 9.2.1 Regions of Attraction of a Differential Equation 9.2.2 The Ordinary Differential Equation Method 9.2.3 Markovian Perturbations 9.3 Search for a Strongly Attractive Point 9.3.1 Gradient Estimator for a Linear Model 9.3.2 Strongly Attractive Point 9.3.3 Visual Neurons 9.3.4 Search for the Minimum of a Potential 9.4 How Algorithms Avoid Traps 9.4.1 Negligible Sets for Regressive Series 9.4.2 Recursive Principal Components Analysis

9.1 Controlled Markov Chains 9.1.1 Markov Property Controlled Markov chains were introduced in Section 7.3; readers are referred to that section for the definitions and notation. We consider a controlled Markov chain with transition probability 7r, with state space (E, [) and space of controls (K, K,). The strategies may be limited to 'admissible' strategies, in the sense of Definition 7.3.16: every x E E is associated with a zone A(x) in which the control can be chosen if one is in the state x. We saw (Definition 7.3.18) that the 'canonical

M. Duflo, Random Iterative Models © Springer-Verlag Berlin Heidelberg 1997

306

9. Leaming

»

version' (n, A, (p!), (Xn can be used; if v is an arbitrary initial distribution and pt = I v(dx )p!, then for pt, (Xn ) is a Markov chain with initial distribution v, with transition probability 71", using the strategy 8.lfwe are given such a canonical version when the first k observations have already been carried out. we associate the strategy 8 = (dn ) with the translated strategy 8(k) = (d~» where dO is the controlled Markov chain associated with 71" with initial state X T using the strategy 8(T). This property is transcribed as follows: if tJj is a positive, bounded, random variable on (n, A). then

9.1.2 Return Times and Recurrence In Section 7.3.3. we saw how to take advantage of the properties of lower bounds on the asymptotic frequency of transitions within a ball of radius R < 00, whatever the strategy. Here, we are concerned with obtaining general criteria which guarantee such properties. One particular case of the follow