Learnable Histogram: Statistical Context Features for Deep Neural Networks

Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods. In this paper, w

  • PDF / 2,759,334 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 83 Downloads / 292 Views

DOWNLOAD

REPORT


·

Deep learning

·

Semantic segmentation

·

Introduction

Context features play a crucial role in many vision classification problems, such as semantic segmentation [1–6], object detection [7,8] and pose estimation [9,10]. As illustrated by the toy example in Fig. 1, when performing classification on the blurry white objects with similar appearance, if the semantic histogram from the whole image has a higher bin on the class “sea”, then the object is more likely to be classified as a “boat”; if the histogram has a higher bin on the class “sky”, then it is more likely to be classified as a “bird”. The semantic context thus acts as an important indicator for this classification task. Context features could be mainly categorized into statistical and nonstatistical ones depending on whether they abandon the spatial orders of the context information. On the one hand, for most deep learning methods that gain increasing attention in recent years, non-statistical context features dominate. Some examples include [11] for object detection and [12] for semantic segmentation. On the other hand, statistical context features were mostly used in conventional classification methods with hand-crafted features. Commonly used statistical features include histogram, Bag-of-Words (BoW) [13], Fisher vector [14], c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 246–262, 2016. DOI: 10.1007/978-3-319-46448-0 15

Learnable Histogram: Statistical Context Features for Deep Neural Networks Appearance feature

247

Global context (histogram)

More likely to be “bird” boat bird sky sea

Appearance feature

Global context (histogram)

More likely to be “boat” boat bird sky sea

Fig. 1. A toy example showing that the global context (histogram) of a whole image is helpful for classifying image patches. The image patch is more likely to be a “bird” if the histogram has higher bin counts on the class “sky”, or a “boat” if the histogram has higher bin counts on the class “sea”.

Second-order pooling [15], etc. Such global context features performed successfully with hand-crafted low-level features at their times. However, they were much less studied since the popularity of deep learning. There are a limited number of deep learning methods that tried to incorporate statistical features into deep neural networks. Such examples include the deep Fisher network [16] that incorporate Fisher vector and orderless pooling [17] that combines with Vector of Locally Aggregated Descriptors (VLAD). Both methods aim to improve the image classification performance. However, when calculating the statistical features, both methods fix the network parameters and simply treat features by deep networks as off-the-shelf features. In such a way, the deep networks and the statistical operations are not jointly optimized, which is one of the key factors for the success of deep networks. In this work, we introduce a learnable histogram layer for deep neural networks. Unlike existing deep learning methods that treat statistical o