Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

  • PDF / 1,541,776 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 57 Downloads / 143 Views

DOWNLOAD

REPORT


Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning Aoxue Li1 · Zhiwu Lu2

· Jiechao Guan2 · Tao Xiang3 · Liwei Wang1 · Ji-Rong Wen2

Received: 12 October 2018 / Accepted: 12 May 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Zero-shot learning (ZSL) aims to transfer knowledge from seen classes to unseen ones so that the latter can be recognised without any training samples. This is made possible by learning a projection function between a feature space and a semantic space (e.g. attribute space). Considering the seen and unseen classes as two domains, a big domain gap often exists which challenges ZSL. In this work, we propose a novel inductive ZSL model that leverages superclasses as the bridge between seen and unseen classes to narrow the domain gap. Specifically, we first build a class hierarchy of multiple superclass layers and a single class layer, where the superclasses are automatically generated by data-driven clustering over the semantic representations of all seen and unseen class names. We then exploit the superclasses from the class hierarchy to tackle the domain gap challenge in two aspects: deep feature learning and projection function learning. First, to narrow the domain gap in the feature space, we define a recurrent neural network over superclasses and then plug it into a convolutional neural network for enforcing the superclass hierarchy. Second, to further learn a transferrable projection function for ZSL, a novel projection function learning method is proposed by exploiting the superclasses to align the two domains. Importantly, our transferrable feature and projection learning methods can be easily extended to a closely related task—few-shot learning (FSL). Extensive experiments show that the proposed model outperforms the state-of-the-art alternatives in both ZSL and FSL tasks. Keywords Zero-shot learning · Class hierarchy · Recurrent neural network · Deep feature learning · Projection function learning · Few-shot learning

1 Introduction In the past 5 years, deep neural network (DNN) based models (Huang et al. 2017; Donahue et al. 2014) have achieved Communicated by Cristian Sminchisescu.

B

Zhiwu Lu [email protected] Aoxue Li [email protected] Tao Xiang [email protected]

1

The Key Laboratory of Machine Perception (MOE), School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China

2

The Beijing Key Laboratory of Big Data Management and Analysis Methods, Gaoling School of Artificial Intelligence, Beijing 100872, China

3

The Department of Electrical and Electronic Engineering, University of Surrey, Guildford, Surrey GU2 7XH, UK

super-human performance on the ILSVRC 1K recognition task. However, most existing object recognition models, particularly those DNN-based ones, require hundreds of image samples to be collected for each object class; many of the object classes are rare and it is thus extremely hard, sometimes impossible to collect sufficient training samples, even wit