Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts

  • PDF / 2,526,693 Bytes
  • 21 Pages / 595.276 x 790.866 pts Page_size
  • 32 Downloads / 173 Views

DOWNLOAD

REPORT


Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts Shafin Rahman1,2,3

· Salman H. Khan4 · Fatih Porikli3,5

Received: 31 January 2019 / Accepted: 8 July 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Zero shot learning (ZSL) identifies unseen objects for which no training images are available. Conventional ZSL approaches are restricted to a recognition setting where each test image is categorized into one of several unseen object classes. We posit that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complete scene, warranting both ‘recognition’ and ‘localization’ of the unseen category. To address this limitation, we introduce a new ‘ZeroShot Detection’ (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories, without any training samples. We introduce an integrated solution to the ZSD problem that jointly models the complex interplay between visual and semantic domain information. Ours is an end-to-end trainable deep network for ZSD that effectively overcomes the noise in the unsupervised semantic descriptions. To this end, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic domain clustering. In order to set a benchmark for ZSD, we propose an experimental protocol for the large-scale ILSVRC dataset that adheres to practical challenges, e.g., rare classes are more likely to be the unseen ones. Furthermore, we present a baseline approach extended from conventional recognition to the ZSD setting. Our extensive experiments show a significant boost in performance (in terms of mAP and Recall) on the imperative yet difficult ZSD problem on ImageNet detection, MSCOCO and FashionZSD datasets. Keywords Zero-shot learning · Zero-shot object detection · Deep learning · Loss function

1 Introduction Communicated by Tinne Tuytelaars. The codes and dataset split are available at: https://github.com/ salman-h-khan/ZSD_Release.

B

Shafin Rahman [email protected] Salman H. Khan [email protected] Fatih Porikli [email protected]

1

North South University, Dhaka, Bangladesh

2

Data61, CSIRO, Canberra, ACT 2601, Australia

3

Australian National University, Canberra , ACT 0200, Australia

4

Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE

5

Huawei, San Diego, CA, USA

Humans have the amazing ability to develop a generalizable knowledge-base that compiles our sensorimotor experiences over time and relates them to abstract concepts. For instance, if we have seen visual examples of ‘horse’ and ‘donkey’, we can easily recognize their distinctive individual characteristics, such as horses have short ears, long tails and thin coats, while donkeys are shorter in height, have thick coats, long ears and shorter tails. These associations between visual and semantic content enable us to make inferences about unobs