Learning

PDF / 3,266,695 Bytes
43 Pages / 439.37 x 666.142 pts Page_size
33 Downloads / 239 Views

This chapter begins by extending Chapter 7, in which controlled Markov chains were discussed. To choose admissible controls stabilizing these, we first give a Lyapunov criterion relating to the returns to a set for any admissible strategy. We then discuss various stochastic algorithms. The ordinary differential equation method is weil adapted to the study of the convergence of such algorithms when the underlying model has good ergodic properties and when there exists a strongly attractive point. It then provides a useful complement to methods which are essentially based on the use of martingales, such as those applied earlier. However, as soon as the algorithm includes 'traps' this method is primarily useful as a guide to the search for possible limits; technical recourse to martingale tools remains valuable. Without pretending to present a general theory, we give a number of typical examples. 9.1 Controlled Markov Chains 9.1.1 Markov Property 9.1.2 Return Times and Recurrence 9.1.3 Average Costs 9.2 Stochastic Algorithms 9.2.1 Regions of Attraction of a Differential Equation 9.2.2 The Ordinary Differential Equation Method 9.2.3 Markovian Perturbations 9.3 Search for a Strongly Attractive Point 9.3.1 Gradient Estimator for a Linear Model 9.3.2 Strongly Attractive Point 9.3.3 Visual Neurons 9.3.4 Search for the Minimum of a Potential 9.4 How Algorithms Avoid Traps 9.4.1 Negligible Sets for Regressive Series 9.4.2 Recursive Principal Components Analysis

9.1 Controlled Markov Chains 9.1.1 Markov Property Controlled Markov chains were introduced in Section 7.3; readers are referred to that section for the definitions and notation. We consider a controlled Markov chain with transition probability 7r, with state space (E, [) and space of controls (K, K,). The strategies may be limited to 'admissible' strategies, in the sense of Definition 7.3.16: every x E E is associated with a zone A(x) in which the control can be chosen if one is in the state x. We saw (Definition 7.3.18) that the 'canonical

M. Duflo, Random Iterative Models © Springer-Verlag Berlin Heidelberg 1997

306

9. Leaming

»

version' (n, A, (p!), (Xn can be used; if v is an arbitrary initial distribution and pt = I v(dx )p!, then for pt, (Xn ) is a Markov chain with initial distribution v, with transition probability 71", using the strategy 8.lfwe are given such a canonical version when the first k observations have already been carried out. we associate the strategy 8 = (dn ) with the translated strategy 8(k) = (d~» where dO is the controlled Markov chain associated with 71" with initial state X T using the strategy 8(T). This property is transcribed as follows: if tJj is a positive, bounded, random variable on (n, A). then

9.1.2 Return Times and Recurrence In Section 7.3.3. we saw how to take advantage of the properties of lower bounds on the asymptotic frequency of transitions within a ball of radius R < 00, whatever the strategy. Here, we are concerned with obtaining general criteria which guarantee such properties. One particular case of the follow

Data Loading...

Learning

Recommend Documents

Machine Learning vs. Deep Learning

Re-learning Through e-Learning

Learning

Learning about the Learning Curve

Mobile Learning and Ubiquitous Learning

Transfer Learning and Ensemble Learning

Machine Learning and Deep Learning

Learning

Bottom-Up Learning and Top-Down Learning

Cause-Effect Learning Versus Effect-Cause Learning

What Kind of Learning Is Machine Learning?

Learning to Play Reinforcement Learning and Games