Fitting Multidimensional Data Using Gradient Penalties and Combination Techniques

Sparse grids, combined with gradient penalties provide an attractive tool for regularised least squares fitting. It has earlier been found that the combination technique, which allows the approximation of the sparse grid fit with a linear combination of f

  • PDF / 321,404 Bytes
  • 14 Pages / 439.324 x 666.21 pts Page_size
  • 72 Downloads / 191 Views

DOWNLOAD

REPORT


Abstract Sparse grids, combined with gradient penalties provide an attractive tool for regularised least squares fitting. It has earlier been found that the combination technique, which allows the approximation of the sparse grid fit with a linear combination of fits on partial grids, is here not as effective as it is in the case of elliptic partial differential equations. We argue that this is due to the irregular and random data distribution, as well as the proportion of the number of data to the grid resolution. These effects are investigated both in theory and experiments. The application of modified “optimal” combination coefficients provides an advantage over the ones used originally for the numerical solution of PDEs, who in this case simply amplify the sampling noise. As part of this investigation we also show how overfitting arises when the mesh size goes to zero.

1 Introduction In this paper we consider the regression problem arising in machine learning. A set of data points xi in a d-dimensional feature space is given, together with an associated value yi . We assume that a function f∗ describes the relation between the predictor variables x and the response variable y and want to (approximately) reconstruct the function f∗ from the given data. This allows us to predict the function value of any newly given data point for future decision-making. In [4] a discretisation approach to the regularised least squares ansatz [10] was introduced. An independent grid with associated local basis functions is used to discretise the minimisation problem. This way the data information is transferred into the discrete function space defined by the grid and its corresponding basis functions. Such a discretisation approach is similar to the numerical treatment of partial differential equations by finite element methods. To cope with the complexity of grid-based discretisation methods in higher dimensions Garcke et.al. [4] apply the sparse grid combination technique [5]

236

J. Garcke and M. Hegland

to the regression problem. Here, the regularised least squares ansatz is discretised and solved on a certain sequence of anisotropic grids, i.e. grids with different mesh sizes in each coordinate direction. The sparse grid solution is then obtained from the (partial) solutions on these different grids by their linear combination using combination coefficients which depend on the employed grids. The curse of dimensionality for conventional “full” grid methods affects the sparse grid combination technique much less; currently up to around 20 dimensions can be handled. Following empirical results in [3], which show instabilities of the combination technique in certain situations, we investigate in this article the convergence behaviour of full and sparse grid discretisation of the regularised regression problem. The convergence behaviour of the combination technique can be analysed using extrapolation arguments where a certain error expansion for the partial solutions is assumed. Alternatively one can view the combination technique as approxim