Regression with Linear Factored Functions
Many applications that use empirically estimated functions face a curse of dimensionality, because integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functi
- PDF / 753,992 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 67 Downloads / 258 Views
Abstract. Many applications that use empirically estimated functions face a curse of dimensionality, because integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact. Keywords: Regression
1
· Factored functions · Curse of dimensionality
Introduction
This paper introduces a novel regression-algorithm, which performs competitive to Gaussian processes, but yields linear factored functions (LFF). These have outstanding properties like analytical point-wise products and marginalization. Regression is a well known problem, which can be solved by many non-linear architectures like kernel methods (Shawe-Taylor and Cristianini 2004) or neural networks (Haykin 1998). While these perform well, the estimated functions often suffer a curse of dimensionality in later applications. For example, computing an integral over a neural network or kernel function requires to sample the entire input space. Applications like belief propagation (Pearl 1988) and reinforcement learning (Kaelbling et al. 1996), on the other hand, face large input spaces and require therefore efficient computations. We propose LFF for this purpose and showcase its properties in comparison to kernel functions. 1.1
Kernel Regression
In the last 20 years, kernel methods like support vector machines (SVM, Boser et al. 1992; Vapnik 1995) have become a de facto standard in various practical applications. This is mainly due to a sparse representation of the learned classifiers with so called support vectors (SV). The most popular kernel method for c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part I, LNAI 9284, pp. 119–134, 2015. DOI: 10.1007/978-3-319-23528-8 8
120
W. B¨ ohmer and K. Obermayer
regression, Gaussian processes (GP, see Bishop 2006; Rasmussen and Williams 2006), on the other hand, requires as many SV as training samples. Sparse versions of GP aim thus for a small subset of SV. Some select this set based on constraints similar to SVM (Tipping 2001; Vapnik 1995), while others try to conserve the spanned linear function space (sparse GP, Csat´o and Opper 2002; Rasmussen and Williams 2006). There exist also attempts to construct new SV by averaging similar training samples (e.g. Wang et al. 2012). Well chosen SV for regression are usually not sparsely concentrated on a decision boundary as they are for SVM. In fact, many practical appli
Data Loading...