Zero-shot recognition with latent visual attributes learning
- PDF / 1,187,587 Bytes
- 15 Pages / 439.642 x 666.49 pts Page_size
- 38 Downloads / 176 Views
Zero-shot recognition with latent visual attributes learning Yurui Xie1,2 · Xiaohai He1 · Jing Zhang1 · Xiaodong Luo1 Received: 19 July 2019 / Revised: 23 April 2020 / Accepted: 9 July 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Zero-shot learning (ZSL) aims to recognize novel object categories by means of transferring knowledge extracted from the seen categories (source domain) to the unseen categories (target domain). Recently, most ZSL methods concentrate on learning a visual-semantic alignment to bridge image features and their semantic representations by relying solely on the human-designed attributes. However, few works study whether the human-designed attributes are discriminative enough for recognition task. To address this problem, we propose a couple semantic dictionaries (CSD) learning approach to exploit the latent visual attributes and align the visual-semantic spaces at the same time. Specifically, the learned visual attributes are elegantly incorporated into the semantic representation of image feature and then consolidate the discriminative visual cues for object recognition. In addition, existing ZSL methods suffer from the domain shift issue due to the source domain and target domain have completely separated label spaces. We further employ the visual-semantic alignment and latent visual attributes jointly from source domain to regularise the learning of target domain, which ensures the expansibility of information transfer across domains. We formulate this as an optimization problem on a unified objective and propose an iterative solver. Extensive experiments on two challenging benchmark datasets demonstrate that our proposed approach outperforms several state-of-the-art ZSL methods. Keywords Zero-shot learning · Human-designed attributes · Dictionary learning · Visual attributes · Semantic representation Xiaohai He
[email protected] Yurui Xie [email protected] Jing Zhang [email protected] Xiaodong Luo [email protected] 1
College of Electronics and Information Engineering, Sichuan University, Chengdu, China
2
Chengdu University of Information Technology, Chengdu, China
Multimedia Tools and Applications
1 Introduction Visual recognition has made tremendous strides in the past few years with the emergence of large-scale image datasets [14, 28] and rapid progress of deep neural networks [7, 13, 15, 18, 31, 33, 41, 46]. In order to obtain high quality recognition models, the traditional approaches by supervised learning require great amount of labeled image data for each category. However, it is often expensive and laboriously difficult to collect a large quantity of well-labeled instances, especially for some rare categories or the extension of recognition system with these newly appeared categories. Zero-shot leaning (ZSL) [3, 5, 6, 11, 16, 19, 36, 47, 48] is motivated by the above challenges and provides an effective mechanism to transfer the semantic information from the labeled/seen categories to the novel/unseen categor
Data Loading...