Generating Visual Explanations

Clearly explaining a rationale for a classification decision to an end user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-lang

  • PDF / 1,517,976 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 35 Downloads / 201 Views

DOWNLOAD

REPORT


UC Berkeley EECS, Berkeley, CA, USA {lisa anne,rohrbach,jdonahue,trevor}@eecs.berkeley.edu 2 Max Planck Institute for Informatics, Saarbr¨ ucken, Germany {akata,schiele}@mpi-inf.mpg.de 3 ICSI, Berkeley, CA, USA

Abstract. Clearly explaining a rationale for a classification decision to an end user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. We propose a new model that focuses on the discriminating properties of the visible object, jointly predicts a class label, and explains why the predicted label is appropriate for the image. Through a novel loss function based on sampling and reinforcement learning, our model learns to generate sentences that realize a global sentence property, such as class specificity. Our results on the CUB dataset show that our model is able to generate explanations which are not only consistent with an image but also more discriminative than descriptions produced by existing captioning methods. Keywords: Visual explanation vision

1

·

Image description

·

Language and

Introduction

Explaining why the output of a visual system is compatible with visual evidence is a key component for understanding and interacting with AI systems [4]. Deep classification methods have had tremendous success in visual recognition [8,10,20], but their outputs can be unsatisfactory if the model cannot provide a consistent justification of why it made a certain prediction. In contrast, systems which can justify why a prediction is consistent with visual elements to a user are more likely to be trusted [34]. Explanations of visual systems could also aid in understanding network mistakes and provide feedback to improve classifers. We consider explanations as determining why a decision is consistent with visual evidence, and differentiate between introspection explanation systems Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 1) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 3–19, 2016. DOI: 10.1007/978-3-319-46493-0 1

4

L.A. Hendricks et al.

Image Relevance

Western Grebe Description

Description: This is a large bird with a white neck and a black back in the water. Definition: The Western Grebe is has a yellow pointy beak, white neck and belly, and black back. Visual Explanation: This is a Western Grebe because this bird has a long white neck, pointy yellow beak and red eye.

Visual Explanation Laysan Albatross

Description: This is a large flying bird with black wings and a white belly. Definition: The Laysan Albatross is a seabird with a hooked yellow beak, black back and white belly. Visual Explanation: This is a Laysan Albatross because this bird has a large wing

Data Loading...