Zero-Shot Recognition via Structured Prediction

We develop a novel method for zero shot learning (ZSL) based on test-time adaptation of similarity functions learned using training data. Existing methods exclusively employ source-domain side information for recognizing unseen classes during test time. W

PDF / 2,321,805 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
106 Downloads / 182 Views

DOWNLOAD

REPORT

Abstract. We develop a novel method for zero shot learning (ZSL) based on test-time adaptation of similarity functions learned using training data. Existing methods exclusively employ source-domain side information for recognizing unseen classes during test time. We show that for batch-mode applications, accuracy can be signiﬁcantly improved by adapting these predictors to the observed test-time target-domain ensemble. We develop a novel structured prediction method for maximum a posteriori (MAP) estimation, where parameters account for test-time domain shift from what is predicted primarily using source domain information. We propose a Gaussian parameterization for the MAP problem and derive an eﬃcient structure prediction algorithm. Empirically we test our method on four popular benchmark image datasets for ZSL, and show signiﬁcant improvement over the state-of-the-art, on average, by 11.50 % and 30.12 % in terms of accuracy for recognition and mean average precision (mAP) for retrieval, respectively.

Keywords: Zero-shot learning/recognition/retrieval diction · Maximum likelihood estimation

1

·

Structured pre-

Introduction

Zero-shot recognition (ZSR) is the problem of recognizing data instances from unseen classes (i.e. no training data for these classes) during test time. The motivation for ZSR stems from the need for solutions to diverse research problems ranging from poorly annotated big data collections [1] to the problem of extreme classiﬁcation [2]. In this paper we consider the classical ZSL setting. Namely, we are given two sources of data the so called source domain and target domain, respectively. In the source domain, each class is represented by a single vector of side information such as attributes [3–7], language words/phrases [8–10], or even learned classiﬁers [11]. In target domain, each class is represented by a collection of data instances (e.g. images or videos). During training some known classes with data are given as seen classes, while during testing some other unknown classes are revealed as unseen classes. The goal of ZSL is to learn suitable models using seen class training data so that in ZSR the class labels of arbitrary target domain data instances from unseen classes during testing can be predicted. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 533–548, 2016. DOI: 10.1007/978-3-319-46478-7 33

534

Z. Zhang and V. Saligrama

Key Insight: In batch mode we are given the ensemble of target domain data. Our main idea is that even though labels for target-domain data are unknown, subtle shifts in the data distributions can be inferred and these shifts can in turn be utilized to better adapt the learned classiﬁers for test-time use. Intuitively, our insight is justiﬁed by noting that target domain data instances could form compact and disjoint clusters in their latent space embeddings. These clusters can be reliably separated into diﬀerent seen or unseen classes. Nevertheless, the predicted locations of clusters based on so

Data Loading...

Zero-Shot Recognition via Structured Prediction

Recommend Documents

Structured Regularized Robust Coding for Face Recognition

Large Margin Methods for Structured Output Prediction

Human Activity Recognition and Prediction

Structured Prediction of Sequences and Trees Using Infinite Contexts

Urban Scene Recognition via Deep Network Integration

Structured Sparse Coding for Classification via Reweighted \(\ell _{2,1}\) Minimization

Foreground Segmentation via Dynamic Tree-Structured Sparse RPCA

Multi-view Action Recognition Using Cross-View Video Prediction

Improving Face Recognition from Hard Samples via Distribution Distillation Loss

Handwritten Mathematical Expression Recognition via Paired Adversarial Learning

Recognition of Some Texture Faults in Textiles Via Computer Vision

Effective Facial Expression Recognition via the Boosted Convolutional Neural Network