Unconstrained Optimization in Neural Network Training
- PDF / 1,029,171 Bytes
- 11 Pages / 594 x 828 pts Page_size
- 47 Downloads / 206 Views
Notation. Lowercase boldface letters denote vectors, e.g., x, and uppercase boldface letters denote matrices, e.g., M. A matrix that is necessarily positive definite and symmetric has a + superscript attached, e.g., D +. Calligraphic letters, e.g., 7/, denote certain distinguished matrix variables. M o d e l - B a s e d P e r s p e c t i v e . Model-based methods approximate f at a current iterate xk by a local approximating model or direction-finding problem (DfP), which is used to obtain an improving point.
Newton's Method. In Newton's method the DfP is as follows: min
g~-(x -- xk) + 1 (x - Xk) Tnk (x -- Xk)
s.t.
[ I x - xkllD+ _ ~k,
(1) where gk denotes the gradient vector of f at xk, and Hk denotes its (possibly indefinite) Hessian
matrix, i.e., the n x n matrix of second partial derivatives 02f/OxiOxj at Xk. The points x that satisfy the quadratic constraint form the trust region. The quantity II'IlD+ denotes a vector norm defined by a positive definite, symmetric matrix D + that determines the scaling of variables, i.e., IIZIID+ -- (z-FD+z) 1/2 for any vector z. Common choices include the Euclidean norm D + -- I (where I denotes the n × n identity matrix) and the norm defined by a fixed diagonal scaling matrix (independent of k). The quantity ~k is an adaptivelyupdated parameter that defines the size of the trust region. It can be shown that a point x, is the global solution of (1) if and only if there is a scalar Ak _ 0 (Lagrange multiplier) such that (Hk + A k D + ) ( x , - X k ) = --gk, k(llx,
-
xk liD+
-
-
(2)
0,
with ( n k 4-Ak D+) positive semidefinite. For convenience of discussion, assume that the constraint holds as an equality at x, and the matrix (Hk + AkD +) is positive definite. (An 'easy' case occurs when x, is in the interior of the trust region so the DfP is essentially unconstrained with Ak = 0; the other infrequent and so-called 'hard case' arises when the matrix is only positive semidefinite, and it requires a deeper analysis and refinement of the algorithmic techniques. For details, see [12].) Then the optimal multiplier is the solution of the following one-dimensional nonlinear equation in the variable ~ > 0, which is derived directly from (2), namely, IIw(A) l I D + - 5k,
w(A) = - ( H k + AD +)-lgk.
Unconstrained nonlinear optimization: Newton-Cauchy framework Also, the vector x, - Xk is a direction of descent at the point Xk. A variety of strategies can be devised for defining the new current iterate Xk+i. A pure trust region strategy (TR strategy) evaluates the function at x,. If it is not suitably improving then the current iterate is not updated, (fk is reduced, and the procedure repeated. If x, is improving then Xk+i -- x,, and (fk is updated (usually by comparing function reduction predicted by the model against actual reduction). Alternatively, the foregoing strategy can be augmented by a line search along the direction of descent dk -- x, - X k to find an improving point, and again (fk is revised (TR/LS strategy). See also [16] for strategies that explicitly use t
Data Loading...