Learning with Additional Distributions

This paper studies the problem of learning with distributions. In this work, we do not focus on the distribution that represents each data point. Instead, we consider the distribution that is an additional information around each data point. The proposed

  • PDF / 820,629 Bytes
  • 8 Pages / 439.37 x 666.142 pts Page_size
  • 86 Downloads / 218 Views

DOWNLOAD

REPORT


Abstract. This paper studies the problem of learning with distributions. In this work, we do not focus on the distribution that represents each data point. Instead, we consider the distribution that is an additional information around each data point. The proposed method yields a new kernel that is similar to an existing one. The main difference is that our kernel requires an integration in the kernel space. Theoretically, the proposed method yields a better generalization compared to normal SVM.

Keywords: SVM

1

· SMM · Kernel for distributions

Introduction

Several data analysis methods such as classification or regression rely on data in vectorial form. These techniques can be extended using kernel trick to cover other representations such as strings (Lodhi et al. 2002), trees (Moschitti 2006), graphs (Harchaoui and Bach 2007) and distributions (Dalal and Triggs 2005; Vedaldi and Zisserman 2012). The latter is often used for describing the content of an image such as color (Chapelle et al. 1999), gradient information (Dalal and Triggs 2005; Maji et al. 2008), or visual words obtained from local descriptors (Sivic and Zisserman 2006). These data representation can be plugged into an SVM using an appropriate kernel function. Recently Jebara et al. (2004), Hein and Bousquet (2005), and Muandet et al. (2012) proposed new kernel families that combine Mercer kernel with distributions. These kernels exploit the advantages of probabilistic model within discriminative framework of SVM. From previous works, there are two types of distributions that can be distinguished namely the distribution that represents each data point and the distribution around each data point. Terms histogram in text classification, histogram of visual words in object recognition or histogram of colors in image processing belong to the first type of distribution. The second type of distribution can be assigned to data represented in any format. For example, given a set of vectors, we can compute a set of neighbor points and derive a Gaussian distribution around each vector. If the data were already represented as distributions such as histograms of colors, we can still assign additional distribution such as Dirichlet on top of them. This Dirichlet distribution describes, indeed, how the histograms are distributed around a specific histogram. c Springer International Publishing Switzerland 2016  R. Booth and M.-L. Zhang (Eds.): PRICAI 2016, LNAI 9810, pp. 319–326, 2016. DOI: 10.1007/978-3-319-42911-3 27

320

S. Marukatat

The additional distribution, called type-2 distribution hereafter, allows introducing prior knowledge that could be useful especially when dealing with small dataset or uncertainty. In previous works, the distinction between the two types of distributions is not clear. For example, the structural kernel proposed by Hein and Bousquet or the level-2 kernel proposed by Muandet et al. involve type-2 distribution. Nonetheless, some experiments reported therein were based on data represented as type-1 distributions without additional t