A Flow Perspective on Nonlinear Least-Squares Problems

  • PDF / 931,674 Bytes
  • 17 Pages / 439.642 x 666.49 pts Page_size
  • 76 Downloads / 231 Views

DOWNLOAD

REPORT


A Flow Perspective on Nonlinear Least-Squares Problems ¨ Hans Georg Bock1 · Jurgen Gutekunst1 1 ´ Garces ´ Mar´ıa Elena Suarez

· Andreas Potschka1 ·

Received: 25 November 2019 / Accepted: 8 July 2020 / © The Author(s) 2020

Abstract Just as the damped Newton method for the numerical solution of nonlinear algebraic problems can be interpreted as a forward Euler timestepping on the Newton flow equations, the damped Gauß–Newton method for nonlinear least squares problems is equivalent to forward Euler timestepping on the corresponding Gauß–Newton flow equations. We highlight the advantages of the Gauß–Newton flow and the Gauß–Newton method from a statistical and a numerical perspective in comparison with the Newton method, steepest descent, and the Levenberg–Marquardt method, which are respectively equivalent to Newton flow forward Euler, gradient flow forward Euler, and gradient flow backward Euler. We finally show an unconditional descent property for a generalized Gauß–Newton flow, which is linked to Krylov–Gauß–Newton methods for large-scale nonlinear least squares problems. We provide numerical results for large-scale problems: An academic generalized Rosenbrock function and a real-world bundle adjustment problem from 3D reconstruction based on 2D images. Keywords Nonlinear least squares · Gauß–Newton · Globalization · Continuous flows Mathematics Subject Classification (2010) 58C15 · 65H20 · 65K05 · 65L07

Dedicated to Volker Mehrmann on the occasion of his 65th birthday.  J¨urgen Gutekunst

[email protected] Hans Georg Bock [email protected] Andreas Potschka [email protected] Mar´ıa Elena Suar´ez Garc´es [email protected] 1

Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Im Neuenheimer Feld 205, 69120, Heidelberg, Germany

H.G. Bock et al.

1 Introduction We consider nonlinear least-squares problems of the form 1 min f (x)22 over x ∈ Rn , (1) 2 where f : Rn → Rm is twice continuously differentiable. The necessary optimality conditions of (1) require that in a minimum x ∗ ∈ Rn the gradient of the objective function vanishes F (x) := f  (x)T f (x) = 0, (2) where f  denotes the Jacobian of f . Equation (2) is called the normal equation. It is important to realize that the normal equation is only a necessary condition for a minimum, which is also satisfied in other spurious stationary points (maxima, saddle points) of the objective. Stationary points can often be characterized by their Hessian F  (x) = f  (x)T f  (x) +

m 

fi (x)∇ 2 fi (x) =: H (x) + Q(x),

i=1

which splits up into a first order term H based solely on f  and a second-order term Q, in which the current residual f and its Hessian tensor enter (but no first order derivatives). Example 1 (Parameter estimation) The main area of application for this type of problem is maximum likelihood estimation for parameter estimation problems with normally distributed measurement errors (see, e.g., [4]): In this case, we would like to determine an unknown but de