Boost image captioning with knowledge reasoning

PDF / 1,372,205 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
45 Downloads / 248 Views

Boost image captioning with knowledge reasoning Feicheng Huang1 · Zhixin Li1 · Haiyang Wei1 · Canlong Zhang1 · Huifang Ma2 Received: 15 April 2020 / Revised: 21 July 2020 / Accepted: 19 September 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping relationships between words in sentence and regions in image, such unpredictable matching manner sometimes causes inharmonious alignments that may reduce the quality of generated captions. In this paper, we make our efforts to reason about more accurate and meaningful captions. We first propose word attention to improve the correctness of visual attention when generating sequential descriptions word-by-word. The special word attention emphasizes on word importance when focusing on different regions of the input image, and makes full use of the internal annotation knowledge to assist the calculation of visual attention. Then, in order to reveal those incomprehensible intentions that cannot be expressed straightforwardly by machines, we introduce a new strategy to inject external knowledge extracted from knowledge graph into the encoder-decoder framework to facilitate meaningful captioning. Finally, we validate our model on two freely available captioning benchmarks: Microsoft COCO dataset and Flickr30k dataset. The results demonstrate that our approach achieves state-of-the-art performance and outperforms many of the existing approaches. Keywords Image captioning · Word attention · Visual attention · Knowledge graph · Reinforcement learning

1 Introduction Image captioning has recently attracted great attention in the field of artificial intelligence, due to the significant progress of machine learning technologies and the release of a number of large-scale datasets (Hossain et al. 2019; Bai and An 2018; Chen et al. 2017c). The gist of the caption task is to generate a meaningful and natural sentence that describes the Editors: Kee-Eung Kim, Vineeth N Balasubramanian. * Zhixin Li [email protected] 1

Guangxi Key Lab of Multi‑source Information Mining and Security, Guangxi Normal University, Guilin 541004, China

2

College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China

13

Vol.:(0123456789)

Machine Learning

most salient objects and their interactions for the given image. Solving this problem has great impact on human community, as it can help visual impaired people understand various scenes and be treated as an auxiliary means of early childhood education (Jiang et al. 2018a, b). Despite its widely practical applications, image captioning has long been viewed as a challenging research, mainly because it needs to explore a suitable alignment between two different modalities: image and text. The popular image captioning approa

Data Loading...

Boost image captioning with knowledge reasoning

Recommend Documents

Image Captioning

Length-Controllable Image Captioning

Cross-domain personalized image captioning

Attention Mechanism for Fashion Image Captioning

Hierarchical Deep Neural Network for Image Captioning

TextCaps: A Dataset for Image Captioning with Reading Comprehension

GlosysIC Framework: Transformer for Image Captioning with Sequential Attention

Ontological Approach to Image Captioning Evaluation

Comprehensive Image Captioning via Scene Graph Decomposition

DeepDiary: Automatically Captioning Lifelogging Image Streams

Rational Reasoning with Finite Conditional Knowledge Bases Theoretic

Image Captioning in Vietnamese Language Based on Deep Learning Network