Zero-Shot Recognition via Structured Prediction
We develop a novel method for zero shot learning (ZSL) based on test-time adaptation of similarity functions learned using training data. Existing methods exclusively employ source-domain side information for recognizing unseen classes during test time. W
- PDF / 2,321,805 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 106 Downloads / 150 Views
Abstract. We develop a novel method for zero shot learning (ZSL) based on test-time adaptation of similarity functions learned using training data. Existing methods exclusively employ source-domain side information for recognizing unseen classes during test time. We show that for batch-mode applications, accuracy can be significantly improved by adapting these predictors to the observed test-time target-domain ensemble. We develop a novel structured prediction method for maximum a posteriori (MAP) estimation, where parameters account for test-time domain shift from what is predicted primarily using source domain information. We propose a Gaussian parameterization for the MAP problem and derive an efficient structure prediction algorithm. Empirically we test our method on four popular benchmark image datasets for ZSL, and show significant improvement over the state-of-the-art, on average, by 11.50 % and 30.12 % in terms of accuracy for recognition and mean average precision (mAP) for retrieval, respectively.
Keywords: Zero-shot learning/recognition/retrieval diction · Maximum likelihood estimation
1
·
Structured pre-
Introduction
Zero-shot recognition (ZSR) is the problem of recognizing data instances from unseen classes (i.e. no training data for these classes) during test time. The motivation for ZSR stems from the need for solutions to diverse research problems ranging from poorly annotated big data collections [1] to the problem of extreme classification [2]. In this paper we consider the classical ZSL setting. Namely, we are given two sources of data the so called source domain and target domain, respectively. In the source domain, each class is represented by a single vector of side information such as attributes [3–7], language words/phrases [8–10], or even learned classifiers [11]. In target domain, each class is represented by a collection of data instances (e.g. images or videos). During training some known classes with data are given as seen classes, while during testing some other unknown classes are revealed as unseen classes. The goal of ZSL is to learn suitable models using seen class training data so that in ZSR the class labels of arbitrary target domain data instances from unseen classes during testing can be predicted. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 533–548, 2016. DOI: 10.1007/978-3-319-46478-7 33
534
Z. Zhang and V. Saligrama
Key Insight: In batch mode we are given the ensemble of target domain data. Our main idea is that even though labels for target-domain data are unknown, subtle shifts in the data distributions can be inferred and these shifts can in turn be utilized to better adapt the learned classifiers for test-time use. Intuitively, our insight is justified by noting that target domain data instances could form compact and disjoint clusters in their latent space embeddings. These clusters can be reliably separated into different seen or unseen classes. Nevertheless, the predicted locations of clusters based on so
Data Loading...