Few-shot learning with saliency maps as additional visual information

  • PDF / 1,312,953 Bytes
  • 18 Pages / 439.642 x 666.49 pts Page_size
  • 28 Downloads / 262 Views

DOWNLOAD

REPORT


Few-shot learning with saliency maps as additional visual information Mounir Abdelaziz1 · Zuping Zhang1 Received: 17 March 2020 / Revised: 26 August 2020 / Accepted: 15 September 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Few-shot learning aims to learn to recognize new object categories from few training examples. Recently, few-shot learning methods have made significant progress. However, most of these methods are based on the concept of learning relations between only the image features in order to recognize objects and this alone may not be sufficient due to the training data scarcity. Therefore, this study focuses on providing saliency maps as additional visual information that describes the shape of the objects and supports few-shot visual learning. In this paper, we propose a simple few-shot learning method called Few-shot Learning with Saliency Maps as Additional Visual Information (SMAVI). Our method encodes the images and the saliency maps, then it learns the deep relations between the combined image features and saliency map features of the objects, where the saliency maps are extracted from the images using a saliency network. The experimental results show that the proposed method outperforms the related state of the art methods on standard few-shot learning datasets. Keywords Few-Shot learning · Saliency detection · Second-Order statistics · Object recognition

1 Introduction Deep learning has achieved remarkable success in many machine learning applications [13, 14, 24, 35, 44–46, 54] such as object recognition [13, 14, 24, 46]. However, deep learning recognition models require a huge number of training examples. In contrast, the human recognition system can identify between 5,000 and 30,000 object categories after seeing only a few examples or even from a given description alone [1]. For instance, children can identify and differentiate pandas from other animals from one picture of them or  Zuping Zhang

[email protected] Mounir Abdelaziz [email protected] 1

School of Computer Science, Engineering, Central South University, 932 South Lushan Rd, Changsha Hunan, 410083, People’s Republic of China

Multimedia Tools and Applications

from hearing a description of them like “pandas are bears with black patches around their eyes, ears, legs, and shoulders, while the rest of their body is white”. The research areas of zero-shot, one-shot, and few-shot learning have received considerable attention recently, inspired by the human visual recognition ability and a desire to overcome the failures of deep learning models with regard to recognition tasks using few training examples. Few-shot learning aims to build recognition models that can recognize novel object categories with few associated training examples. Recently, few-shot learning methods has attempted to adopt several metric learning approaches with an aim to acquire the concept of relations through deep learning. These methods generally contain two modules: one module to learn the image features and an