Modeling Context in Referring Expressions

Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better mea

PDF / 4,160,264 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
7 Downloads / 243 Views

DOWNLOAD

REPORT

Abstract. Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and ﬁnd that visual comparison to other objects within an image helps improve performance signiﬁcantly. We also develop methods to tie the language generation process together, so that we generate expressions for all objects of a particular category jointly. Evaluation on three recent datasets - RefCOCO, RefCOCO+, and RefCOCOg (Datasets and toolbox can be downloaded from https://github.com/lichengunc/refer), shows the advantages of our methods for both referring expression generation and comprehension.

Keywords: Language expression generation

1

· Language and vision · Generation · Referring

Introduction

In this paper, we look at the dual-tasks of generating and comprehending natural language expressions referring to particular objects within an image. Referring to objects is a natural and common experience. For example, one often uses referring expressions in everyday speech to indicate a particular person or object to a co-observer, e.g., “the man in the red hat” or “the book on the table”. Computational models to generate and comprehend such expressions would have applicability to human-computer interactions, especially for agents such as robots, interacting with humans in the physical world. Successful models will have to connect both recognition of visual attributes of objects and eﬀective natural language generation to compose useful expressions for dialogue. A broader version of this latter goal was considered in 1975 by Paul Grice who introduced maxims describing cooperative conversation between people [9]. These maxims, called the Gricean Maxims, describe a set of rational Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46475-6 5) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part II, LNCS 9906, pp. 69–85, 2016. DOI: 10.1007/978-3-319-46475-6 5

70

L. Yu et al.

principles for natural language dialogue interactions. The 4 maxims are: quality (try to be truthful), quantity (make your contribution as informative as you can, giving as much information as is needed but no more), relevance (be relevant and pertinent to the discussion), and manner (be as clear, brief, and orderly as possible, avoiding obscurity and ambiguity). For the purpose of referring to objects in complex real world scenes these maxims suggest that a well formed expression should be informative, succinct, and unambiguous. The last point is especially necessary for referring to objects in the real world since we often ﬁnd multiple objects of a particular category situated together in a scene. For example, consider the image in Fig. 1 which contains three giraﬀes. We sho

Data Loading...

Modeling Context in Referring Expressions

Recommend Documents

Modeling Context Between Objects for Referring Expression Understanding

Language Use in Joint Action: The Means of Referring Expressions

Generating Referring Expressions from RDF Knowledge Graphs for Data Linking

Global Context Enhanced Multi-modal Fusion for Referring Image Segmentation

Regular Expressions

Context-Aware Modeling of Multimedia Content

Context-Aware Modeling of Multimedia Content

Integer Expressions

Regular Expressions

Toward Data Warehouse Modeling in the Context of Big Data

Working with Flow Expressions

Manipulating algebraic expressions