Ontological Approach to Image Captioning Evaluation

  • PDF / 652,928 Bytes
  • 7 Pages / 612 x 792 pts (letter) Page_size
  • 77 Downloads / 215 Views

DOWNLOAD

REPORT


Ontological Approach to Image Captioning Evaluation D. Shunkevicha,* and N. Iskraa,** a

Belarusian State University of Informatics and Radioelectronics, Minsk, 220013 Belarus * e-mail: [email protected] ** e-mail: [email protected]

Abstract—The paper considers the ontology of the existing metrics widely used for image captioning task evaluation. It is shown how the ontological approach provides more natural and resilient way to image captioning quality assurance in comparison with machine translation metrics variations. Another important problem, discussed in the paper, is the information support for researchers in the field of image captioning. Keywords: image captioning, ontology, image captioning evaluation metrics DOI: 10.1134/S1054661820030256

INTRODUCTION Currently image captioning is a very rapidly developing research area [1]. The task of generating language descriptions from images encompasses both image processing and natural language processing. The results can be used in information systems for medicine, security, education, industry, entertainment, etc. To produce image annotation, one should determine scene type and location, detect and recognize objects, their attributes and relations. Moreover, the resulting sentences should be syntactically and semantically correct and appear natural for human perception. Unlike detection and recognition tasks, where the results can be evaluated directly, to establish whether the caption is correct or not is quite not trivial. The evaluation concerns factual (correct objects and attributes), syntactical (correct structure of phrases) and semantical (correct relations and meaning) aspects. In this paper using ontological approach we explore existing methods for image captioning quality evaluation with a goal to come up with a new formal and applicable way of image captioning correctness assurance. We propose universal framework that systematizes widely used evaluation metrics, simplifies their application and provides information support for research. We also demonstrate how ontological approach can be applied to several state-of-the-art image captioning methods evaluation and algorithms comparison.

Received April 14, 2020; revised April 14, 2020; accepted April 14, 2020

2. DATASETS AND METRICS FOR IMAGE CAPTIONING EVALUATION There is a number of datasets that is widely used to train, test and evaluate image captioning algorithms. As a rule, the data is represented as a set of images with a number or corresponding textual descriptions that are considered a reference. These references are obtained from human peers and may vary in word choice, degree of detail and even stylistic tone. One of the examples is MS COCO captions dataset [2] that contains over 500 K captions for over 330 K images. The evaluation comes down to comparing candidate (generated by image captioning algorithm) description with one or several references. Completely different approach is presented by Visual Genome dataset [3] where over 100 K images are provided not only with annota