Top-Down Neural Attention by Excitation Backprop

We aim to model the top-down attention of a Convolutional Neural Network (CNN) classifier for generating task-specific attention maps. Inspired by a top-down human visual attention model, we propose a new backpropagation scheme, called Excitation Backprop

PDF / 3,045,482 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
70 Downloads / 208 Views

DOWNLOAD

REPORT

Boston University, Boston, USA {jmzhang,sclaroff}@bu.edu 2 Adobe Research, San Jose, USA {zlin,jbrandt,xshen}@adobe.com

Abstract. We aim to model the top-down attention of a Convolutional Neural Network (CNN) classiﬁer for generating task-speciﬁc attention maps. Inspired by a top-down human visual attention model, we propose a new backpropagation scheme, called Excitation Backprop, to pass along top-down signals downwards in the network hierarchy via a probabilistic Winner-Take-All process. Furthermore, we introduce the concept of contrastive attention to make the top-down attention maps more discriminative. In experiments, we demonstrate the accuracy and generalizability of our method in weakly supervised localization tasks on the MS COCO, PASCAL VOC07 and ImageNet datasets. The usefulness of our method is further validated in the text-to-region association task. On the Flickr30k Entities dataset, we achieve promising performance in phrase localization by leveraging the top-down attention of a CNN model that has been trained on weakly labeled web images.

1

Introduction

Top-down task-driven attention is an important mechanism for eﬃcient visual search. Various top-down attention models have been proposed, e.g. [1–4]. Among them, the Selective Tuning attention model [3] provides a biologically plausible formulation. Assuming a pyramidal neural network for visual processing, the Selective Tuning model is composed of a bottom-up sweep of the network to process input stimuli, and a top-down Winner-Take-ALL (WTA) process to localize the most relevant neurons in the network for a given top-down signal. Inspired by the Selective Tuning model, we propose a top-down attention formulation for modern CNN classiﬁers. Instead of the deterministic WTA process used in [3], which can only generate binary attention maps, we formulate the top-down attention of a CNN classiﬁer as a probabilistic WTA process. The probabilistic WTA formulation is realized by a novel backpropagation scheme, called Excitation Backprop, which integrates both top-down and Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 33) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 543–559, 2016. DOI: 10.1007/978-3-319-46493-0 33

544

J. Zhang et al. Input

chair

glass

boy

woman

man

couple

father

Fig. 1. A CNN classiﬁer’s top-down attention maps generated by our Excitation Backprop can localize common object categories, e.g. chair and glass, as well as ﬁnegrained categories like boy, man and woman in this example image, which is resized to 224×224 for our method. The classiﬁer used in this example is trained to predict ∼18 K tags using only weakly labeled web images. Visualizing the classiﬁer’s top-down attention can also help interpret what has been learned by the classiﬁer. For couple, we can tell that our classiﬁer uses the two adults in the image as the evidence, while for

Data Loading...

Top-Down Neural Attention by Excitation Backprop

Recommend Documents

Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks

Collaborative Filtering: Graph Neural Network with Attention

Automatic MR Spinal Cord Segmentation by Hybrid Residual Attention-Aware Convolutional Neural Networks and Learning Rate

Two stages double attention convolutional neural network for crowd counting

Growing evidence for separate neural mechanisms for attention and consciousness

The Many Faces of Social Attention Behavioral and Neural Measures

Excitation of helicons by current antennas

Hierarchical Multi-view Attention for Neural Review-Based Recommendation

Explainable Enterprise Rating Using Attention Based Convolutional Neural Network

Hybrid Attention Based Neural Architecture for Text Semantics Similarity Measurement

Neural attention model for recommendation based on factorization machines

Semantically Corroborating Neural Attention for Biomedical Question Answering