Least Squares Problems

In this chapter we provide an overview of the original minimum least squares problem and its variations. We present their robust formulations as they have been proposed in the literature so far. We show the analytical solutions for each variation and we c

  • PDF / 2,837,657 Bytes
  • 12 Pages / 439.36 x 666.15 pts Page_size
  • 88 Downloads / 222 Views

DOWNLOAD

REPORT


Least Squares Problems

Abstract In this chapter we provide an overview of the original minimum least squares problem and its variations. We present their robust formulations as they have been proposed in the literature so far. We show the analytical solutions for each variation and we conclude the chapter with some numerical techniques for computing them efficiently.

2.1 Original Problem In the original linear least squares (LLS) problem one needs to determine a linear model that approximates “best” a group of samples (data points). Each sample might correspond to a group of experimental parameters or measurements and each individual parameter to a feature or, in statistical terminology, to a predictor. In addition, each sample is characterized by an outcome which is defined by a real valued variable and might correspond to an experimental outcome. Ultimately we wish to determine a linear model able to issue outcome prediction for new samples. The quality of such a model can be determined by a minimum distance criterion between the samples and the linear model. Therefore if n data points, of dimension m each, are represented by a matrix A ∈ Rn×m and the outcome variable by a vector b ∈ Rn (each entry corresponding to a row of matrix A), we need to determine a vector x ∈ Rm such that the residual error, expressed by some norm, is minimized. This can be stated as: min Ax − b22 x

(2.1)

where  · 2 is the Euclidean norm of a vector. The objective function value is also called residual and denoted r(A, b, x) or just r. The geometric interpretation of this problem is to find a vector x such that the sum of the distances between the points represented by the rows of matrix A and the hyperplane defined by xT w − b = 0 (where w is the independent variable) is minimized. In this sense this problem is a

P. Xanthopoulos et al., Robust Data Mining, SpringerBriefs in Optimization, DOI 10.1007/978-1-4419-9878-1 2, © Petros Xanthopoulos, Panos M. Pardalos, Theodore B. Trafalis 2013

9

10

2 Least Squares Problems

100 80

b

60 40 20 0 −20 −40 −120−100 −80 −60 −40 −20

0 a

20

40

60

80

100 120

Fig. 2.1 The single input single outcome case. This is a 2D example the predictor represented by the a variable and the outcome by vertical axis b

first order polynomial fitting problem. Then by determining the optimal x vector will be able to issue predictions for new samples by just computing their inner product with x. An example in two dimensions (2D) can be seen in Fig. 2.1. In this case the data matrix will be A = [a e] ∈ Rn×2 where a is the predictor variable and e a column vector of ones that accounts for the constant term. The problem can be solved, in its general form, analytically since we know that the global minimum will be at a Karush–Kuhn–Tucker (KKT) point (since the problem is convex and unconstrained) the Lagrangian equation LLLS (x) will be given by the objective function itself and the KKT points can be obtained by solving the following equation: dLLLS (x) = 0 ⇔ 2AT Ax = AT b dx

(2.2)

In case that A is o