Salient Deconvolutional Networks

Deconvolution is a popular method for visualizing deep convolutional neural networks; however, due to their heuristic nature, the meaning of deconvolutional visualizations is not entirely clear. In this paper, we introduce a family of reversed networks th

  • PDF / 17,461,805 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 70 Downloads / 180 Views

DOWNLOAD

REPORT


Abstract. Deconvolution is a popular method for visualizing deep convolutional neural networks; however, due to their heuristic nature, the meaning of deconvolutional visualizations is not entirely clear. In this paper, we introduce a family of reversed networks that generalizes and relates deconvolution, backpropagation and network saliency. We use this construction to thoroughly investigate and compare these methods in terms of quality and meaning of the produced images, and of what architectural choices are important in determining these properties. We also show an application of these generalized deconvolutional networks to weakly-supervised foreground object segmentation.

Keywords: DeConvNets Saliency · Segmentation

1

·

Deep convolutional neural networks

·

Introduction

Despite the success of modern Convolutional Neural Networks (CNNs), there is a limited understanding of how these complex black-box models achieve their performance. Methods such as deconvolutional networks (DeConvNets) have been proposed to visualize image patterns that strongly activate any given neuron in a CNN [25] and therefore shed some light on the CNN structure. However, the DeConvNet construction is partially heuristic and so are the corresponding visualizations. Simonyan et al. [16] noted similarities with their network saliency method which partially explains DeConvNets, but this interpretation remains incomplete. This paper carries a novel and systematic analysis of DeConvNets and closely related visualization methods such as network saliency. Our first contribution is to extend DeConvNet to a general method for architecture reversal and visualization. In this construction, the reversed layers use selected information extracted by the forward network, which we call bottleneck information (Sect. 2). We show that backpropagation is a special case of this construction which yields a reversed architecture, SaliNet, equivalent to the network saliency method of Simonyan et al. (Sect. 2.1). We also show that the only difference between Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46466-4 8) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 120–135, 2016. DOI: 10.1007/978-3-319-46466-4 8

Salient Deconvolutional Networks

121

Fig. 1. From top row to bottom: Original image, DeConvNet, SaliNet and our DeSaliNet visualizations from the fc8 layer in AlexNet (just before the softmax operation). The maximally active neuron is visualized in each case. DeSaliNet results in crisper visualizations. They suppress the background while preserving edge information. Best viewed on screen.

DeConvNet and SaliNet is a seemingly innocuous change in the reversal of Rectified Linear Units (ReLU; Sect. 2.2). However, this change has a very significant effect on the results: the SaliNet response is well localized but lacks structure, whereas the DeConvNet response accurately reproduces