Building referring expression corpora with and without feedback

PDF / 1,196,672 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
12 Downloads / 325 Views

Building referring expression corpora with and without feedback Danillo da Silva Rocha1 • Ivandre´ Paraboni1

Springer Nature B.V. 2020

Abstract The design of data collection experiments involving human participants is a common task in Referring Expression Generation (REG) and related fields. Many (or most) REG data collection tasks are implemented by making use of a human–computer (e.g., web-based) communicative setting, in which participants do not have any particular addressee in mind and do not receive any feedback regarding the appropriateness (e.g., uniqueness) of the descriptions that they produce. Others, at a possibly higher cost, make use of participant pairs engaged in some form of dialogue in which hearers may provide feedback allowing speakers to rephrase ambiguous or otherwise ill-formed descriptions. Leaving the issue of cost aside, however, it remains unclear whether the two methods elicit similar referring expressions for the purpose of REG research. To shed light on this issue, this paper presents a REG corpus built under three experimental conditions: a standard human–computer (or web-based) setting in which no feedback is available to the speaker, and two settings in which feedback regarding the appropriateness of the description may be provided either by an automated parsing tool or by a second participant at the receiving end of the communication. The corpus contains fully annotated descriptions in two domains—simple geometric objects and realistic human face images—and it is provided as a resource for the training and testing of REG algorithms in these communicative settings. Keywords Natural Language Generation Referring Expression Generation Reference production

& Ivandre´ Paraboni [email protected] 1

School of Arts, Sciences and Humanities (EACH), University of Sa˜o Paulo (USP), Sa˜o Paulo, Brazil

123

D. da S. Rocha, I. Paraboni

1 Introduction In computational Natural Language Generation (NLG) studies, the collection of referring expressions—usually in the form of definite descriptions as in, e.g., ‘the girl with short hair’ or ‘the red box’—produced by human participants is a common task in Referring Expression generation (REG) and related fields (Krahmer and van Deemter 2012). Descriptions of this kind are usually elicited from visual stimuli representing a context in which there is one particular target and additional distractor objects. An example conveying five objects (labelled as o1…o5 for ease of discussion) is illustrated in Fig. 1. Based on stimuli of this kind, human participants—who act as speakers or writers1—are requested to produce a uniquely identifying description of a given target under controlled circumstances. The elicited data—usually in the form of an annotated referring expression corpus—is then taken as training and/or test data for computational REG models, or for the study of reference phenomena in general. The latter includes the study of issues of referential overspecification (Paraboni and Deemter 1999), human variation (Viethen and Dale 2010), the us

Data Loading...

Building referring expression corpora with and without feedback

Recommend Documents

Treebanks Building and Using Parsed Corpora

Evaluating Methods for Building Arabic Semantic Resources with Big Corpora

Modeling Context Between Objects for Referring Expression Understanding

Adrenomedullin in rat follicles and corpora lutea: expression, functions and interaction with endothelin-1

Corpora

Corpora and Language Education

Building Knowledge Graph in Spark Without SPARQL

Corpora Pedunculata

Trunk muscle activity during pressure feedback monitoring among individuals with and without chronic low Back pain

Differential Expression of Cytokine-Coding Genes among Migraine Patients with and without Aura and Normal Subjects

Fingerprint Corpora

Evaluation Methodology and Test Corpora