Bidirectional generative transductive zero-shot learning

  • PDF / 3,077,240 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 49 Downloads / 198 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

REVIEW ARTICLE

Bidirectional generative transductive zero-shot learning Xinpeng Li1 • Dan Zhang1 • Mao Ye1



Xue Li2 • Qiang Dou1 • Qiao Lv1

Received: 11 March 2020 / Accepted: 2 September 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Most zero-shot learning (ZSL) methods aim to learn a mapping from visual feature space to semantic feature space or from both visual and semantic feature spaces to a common joint space and align them. However, in these methods the visual and semantic information are not utilized sufficiently and the useless information is not excluded. Moreover, there exists a strong bias problem that the instances from unseen classes always tend to be predicted as some seen classes in most ZSL methods. In this paper, combining the advantages of generative adversarial networks (GANs), a method based on bidirectional projections between the visual and semantic feature spaces is proposed. GANs are used to perform bidirectional generations and alignments between the visual and semantic features. In addition, cycle mapping structure ensures that the important information are kept in the alignments. Furthermore, in order to better solve the bias problem, pseudo-labels are generated for unseen instances and the model is adjusted along with them iteratively. We conduct extensive experiments at traditional ZSL and generalized ZSL settings, respectively. Experiment results confirm that our method achieves the stateof-the-art performances on the popular datasets AWA2, aPY and SUN. Keywords Zero-shot learning  Transductive  Bidirectional generation  CycleGAN

1 Introduction Image recognition trained on a large number of labeled instances can get good results at present, but it takes a lot of manpower and resources to collect these labeled images. Especially, it requires experts to give identification for finegrained classification. How to complete image recognition with only a few labeled instances or even some categories without labels has become a very challenging and realistic task. Zero-shot learning (ZSL) [22, 33, 41] is an effective method to solve the above problem. Zero-shot learning is a special unsupervised domain adaptation method. Its purpose is to learn a model based on a set of labeled source data, and then transfer the learned knowledge to the target & Mao Ye [email protected] 1

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, People’s Republic of China

2

School of Information Technology and Electronic Engineering, The University of Queensland, Brisbane, QLD 4072, Australia

domain to identify another set of unlabeled data. In zeroshot learning setting, the data categories in these two domains are assumed completely non-overlapping. Because the source data during training are labeled, we usually call the classes in source domain as seen classes, and the classes in target domain as unseen classes. Zeroshot learning can be divided into traditional