Unconstrained Optimization in Neural Network Training

PDF / 1,029,171 Bytes
11 Pages / 594 x 828 pts Page_size
47 Downloads / 240 Views

Notation. Lowercase boldface letters denote vectors, e.g., x, and uppercase boldface letters denote matrices, e.g., M. A matrix that is necessarily positive definite and symmetric has a + superscript attached, e.g., D +. Calligraphic letters, e.g., 7/, denote certain distinguished matrix variables. M o d e l - B a s e d P e r s p e c t i v e . Model-based methods approximate f at a current iterate xk by a local approximating model or direction-finding problem (DfP), which is used to obtain an improving point.

Newton's Method. In Newton's method the DfP is as follows: min

g~-(x -- xk) + 1 (x - Xk) Tnk (x -- Xk)

s.t.

[ I x - xkllD+ _ ~k,

(1) where gk denotes the gradient vector of f at xk, and Hk denotes its (possibly indefinite) Hessian

matrix, i.e., the n x n matrix of second partial derivatives 02f/OxiOxj at Xk. The points x that satisfy the quadratic constraint form the trust region. The quantity II'IlD+ denotes a vector norm defined by a positive definite, symmetric matrix D + that determines the scaling of variables, i.e., IIZIID+ -- (z-FD+z) 1/2 for any vector z. Common choices include the Euclidean norm D + -- I (where I denotes the n × n identity matrix) and the norm defined by a fixed diagonal scaling matrix (independent of k). The quantity ~k is an adaptivelyupdated parameter that defines the size of the trust region. It can be shown that a point x, is the global solution of (1) if and only if there is a scalar Ak _ 0 (Lagrange multiplier) such that (Hk + A k D + ) ( x , - X k ) = --gk, k(llx,

-

xk liD+

-

-

(2)

0,

with ( n k 4-Ak D+) positive semidefinite. For convenience of discussion, assume that the constraint holds as an equality at x, and the matrix (Hk + AkD +) is positive definite. (An 'easy' case occurs when x, is in the interior of the trust region so the DfP is essentially unconstrained with Ak = 0; the other infrequent and so-called 'hard case' arises when the matrix is only positive semidefinite, and it requires a deeper analysis and refinement of the algorithmic techniques. For details, see [12].) Then the optimal multiplier is the solution of the following one-dimensional nonlinear equation in the variable ~ > 0, which is derived directly from (2), namely, IIw(A) l I D + - 5k,

w(A) = - ( H k + AD +)-lgk.

Unconstrained nonlinear optimization: Newton-Cauchy framework Also, the vector x, - Xk is a direction of descent at the point Xk. A variety of strategies can be devised for defining the new current iterate Xk+i. A pure trust region strategy (TR strategy) evaluates the function at x,. If it is not suitably improving then the current iterate is not updated, (fk is reduced, and the procedure repeated. If x, is improving then Xk+i -- x,, and (fk is updated (usually by comparing function reduction predicted by the model against actual reduction). Alternatively, the foregoing strategy can be augmented by a line search along the direction of descent dk -- x, - X k to find an improving point, and again (fk is revised (TR/LS strategy). See also [16] for strategies that explicitly use t

Data Loading...

Unconstrained Optimization in Neural Network Training

Recommend Documents

Unconstrained optimization

Unconstrained Optimization

Large Scale Unconstrained Optimization

DeepPlaner: Query Optimization Using Recurrent Neural Network

Accelerated Algorithms for Unconstrained Convex Optimization

Multi-GPU Based Recurrent Neural Network Language Model Training

Artificial neural networks training acceleration through network science strategies

Neural Network Training Using a Biogeography-Based Learning Strategy

Stability Identification of Power System Based Neural Network Training

New optimization methods for designing rain stations network using new neural network, election, and whale optimization

Tool Steel Heat Treatment Optimization Using Neural Network Modeling

U-Net Neural Network Optimization Method Based on Deconvolution Algorithm