Joint Semi-supervised Similarity Learning for Linear Classification

The importance of metrics in machine learning has attracted a growing interest for distance and similarity learning. We study here this problem in the situation where few labeled data (and potentially few unlabeled data as well) is available, a situation

PDF / 823,097 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
45 Downloads / 258 Views

DOWNLOAD

REPORT

´ Universit´e Jean Monnet, Laboratoire Hubert Curien, Saint-Etienne, France {Amaury.Habrard,Marc.Sebban}@univ-st-etienne.fr Universit´e Grenoble Alpes, CNRS-LIG/AMA, Saint-Martin-d’H´eres, France {Irina.Nicolae,Eric.Gaussier}@imag.fr

Abstract. The importance of metrics in machine learning has attracted a growing interest for distance and similarity learning. We study here this problem in the situation where few labeled data (and potentially few unlabeled data as well) is available, a situation that arises in several practical contexts. We also provide a complete theoretical analysis of the proposed approach. It is indeed worth noting that the metric learning research ﬁeld lacks theoretical guarantees that can be expected on the generalization capacity of the classiﬁer associated to a learned metric. The theoretical framework of (, γ, τ )-good similarity functions [1] has been one of the ﬁrst attempts to draw a link between the properties of a similarity function and those of a linear classiﬁer making use of it. In this paper, we extend this theory to a method where the metric and the separator are jointly learned in a semi-supervised way, setting that has not been explored before, and provide a theoretical analysis of this joint learning via Rademacher complexity. Experiments performed on standard datasets show the beneﬁts of our approach over state-of-theart methods. Keywords: Similarity learning complexity

· (, γ, τ )-good similarity · Rademacher

.

1

Introduction

Many researchers have used the underlying geometry of the data to improve classiﬁcation algorithms, e.g. by learning Mahanalobis distances instead of the standard Euclidean distance, thus paving the way for a new research area termed metric learning [5,6]. If most of these studies have based their approaches on distance learning [3,9,10,22,24], similarity learning has also attracted a growing interest [2,12,16,20], the rationale being that the cosine similarity should in some cases be preferred over the Euclidean distance. More recently, [1] have proposed a complete framework to relate similarities with a classiﬁcation algorithm making use of them. This general framework, that can be applied to any c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part I, LNAI 9284, pp. 594–609, 2015. DOI: 10.1007/978-3-319-23528-8 37

Joint Semi-supervised Similarity Learning for Linear Classiﬁcation

595

bounded similarity function (potentially derived from a distance), provides generalization guarantees on a linear classiﬁer learned from the similarity. Their algorithm does not enforce the positive deﬁniteness constraint of the similarity, like most state-of-the-art methods. However, to enjoy such generalization guarantees, the similarity function is assumed to be known beforehand and to satisfy (, γ, τ )-goodness properties. Unfortunately, [1] do not provide any algorithm for learning such similarities. In order to overcome these limitations, [4] have explored the possibility of independently learning an (, γ,

Data Loading...

Joint Semi-supervised Similarity Learning for Linear Classification

Recommend Documents

Software Similarity and Classification

Knowledge-driven graph similarity for text classification

Linear Programming Models for Classification

Knee joint vibration signal classification algorithm based on machine learning

Label-Similarity Curriculum Learning

A novel semisupervised SVM for pixel classification of remote sensing imagery

KNN Applied to PDG for Source Code Similarity Classification

A Similarity-Based Approach for Shape Classification Using Region Decomposition

Supervised Learning for Classification Problems

Concept Similarity in Multidisciplinary Learning

Modeling Multi-aspect Relationship with Joint Learning for Aspect-Level Sentiment Classification

Classification Learning