Natural Gradient Learning and Its Dynamics in Singular Regions

Learning takes place in a parameter space, which is not Euclidean in general but Riemannian. Therefore, we need to take the Riemannian structure into account when designing a learning method.

PDF / 812,756 Bytes
36 Pages / 439.37 x 666.142 pts Page_size
51 Downloads / 161 Views

DOWNLOAD

REPORT

Natural Gradient Learning and Its Dynamics in Singular Regions

Learning takes place in a parameter space, which is not Euclidean in general but Riemannian. Therefore, we need to take the Riemannian structure into account when designing a learning method. The natural gradient method, which is a version of stochastic descent learning, is proposed for this purpose, using the Riemannian gradient. It is a Fisher efficient on-line method of estimation. Its performance is excellent in general and it has been used in various types of learning problems such as neural learning, policy gradient in reinforcement learning, optimization by means of stochastic relaxation, independent component analysis, Monte Carlo Markov Chain (MCMC) in a Riemannian manifold and others. Some statistical models are singular, implying that its parameter space includes singular regions. The multilayer perceptron (MLP) is a typical singular model. Since supervised learning of MLP is involved in deep learning, it is important to study the dynamical behavior of learning in singular regions, in which learning is very slow. This is known as plateau phenomena. The natural gradient method overcomes this difficulty.

12.1 Natural Gradient Stochastic Descent Learning 12.1.1 On-Line Learning and Batch Learning Huge amounts of data exist in the real world. Consider a set of data which are generated randomly subject to a fixed but unknown probability distribution. A typical example is shown in the regression problem, where input signal x is generated randomly, accompanied by a desired response f (x). A teacher signal y, which is a noisy version of the desired output f (x), y = f (x) + ε, © Springer Japan 2016 S. Amari, Information Geometry and Its Applications, Applied Mathematical Sciences 194, DOI 10.1007/978-4-431-55978-8_12

(12.1) 279

280

12 Natural Gradient Learning and Its Dynamics in Singular Regions

is given together with x, where ε is random noise. The task of a learning machine is, in this case, to estimate the desired output mapping f (x) by using the available examples of input–output pairs D = {(x i , yi ) , i = 1, 2, . . . , T }, called training examples. They are subject to an unknown joint probability distribution, p(x, y) = q(x)Prob {y|x} = q(x) pε {y − f (x)} ,

(12.2)

where q(x) is the probability distribution of x and pε (ε) is the probability distribution of noise ε, typically Gaussian. This is a usual scheme of supervised learning. We use a parameterized family f (x, ξ) of functions as candidates for the desired output, where ξ is a vector parameter. The set of ξ is a parameter space and we search for the optimal ξˆ that approximates the true f (x) by using training examples D. When y takes an analog value, this is a regression problem. When y is discrete, say binary, this is pattern recognition. In order to evaluate the performance of machine f (x, ξ), we define a loss function or cost function. The instantaneous loss of processing x by machine f (x, ξ) is typically given by 1 (12.3) l(x, y; ξ) = {y − f (x, ξ)}2 , 2 in the case of

Data Loading...

Natural Gradient Learning and Its Dynamics in Singular Regions

Recommend Documents

Natural Gradient

On the locality of the natural gradient for learning in deep Bayesian networks

The Dynamics of One Dimensional Singular Limiting Problem

Statistical Learning and Its Consequences

Deep Learning in Natural Language Processing

Artificial learning companionusing machine learning and natural language processing

Learning Gradient Fields for Shape Generation

Investigating the Dynamics of Stochastic Learning Processes: A Didactical Research Perspective, Its Methodological and T

A Review on the Discrete Singular Convolution Algorithm and Its Applications in Structural Mechanics and Engineering

Analyzing Switch Regions of Human Rab10 by Molecular Dynamics Simulations

Intrinsically Motivated Learning in Natural and Artificial Systems

Computational Fluid Dynamics for Built and Natural Environments