Remote sensing image caption generation via transformer and reinforcement learning

PDF / 2,170,712 Bytes
22 Pages / 439.642 x 666.49 pts Page_size
93 Downloads / 271 Views

Remote sensing image caption generation via transformer and reinforcement learning Xiangqing Shen1

· Bing Liu1,2,3 · Yong Zhou1,2 · Jiaqi Zhao2

Received: 18 July 2019 / Revised: 24 June 2020 / Accepted: 29 June 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Image captioning is a task generating the natural semantic description of the given image, which plays an essential role for machines to understand the content of the image. Remote sensing image captioning is a part of the field. Most of the current remote sensing image captioning models failed to fully utilize the semantic information in images and suffered the overfitting problem induced by the small size of the dataset. To this end, we propose a new model using the Transformer to decode the image features to target sentences. For making the Transformer more adaptive to the remote sensing image captioning task, we additionally employ dropout layers, residual connections, and adaptive feature fusion in the Transformer. Reinforcement Learning is then applied to enhance the quality of the generated sentences. We demonstrate the validity of our proposed model on three remote sensing image captioning datasets. Our model obtains all seven higher scores on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), four higher scores on UCM dataset, which indicates that the proposed methods perform better than the previous state of the art models in remote sensing image caption generation. Keywords Transformer · Remote sensing image captioning · Attention mechanisms · Convolutional neural network · Reinforcement learning

1 Introduction Recently, significant progress has been made in modern remote sensing technologies due to the improvements in sensors and computing power. It can help people access geospatial information more easily. There are a variety of areas using remote sensing images, such Bing Liu

[email protected] 1

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, Jiangsu province, China

2

Mine Digitization Engineering Research Center of Ministry of Education of the People’s Republic of China, Xuzhou, China

3

Insititute of Electrics, Chinese Academy of Sciences, Beijing, 100190, China

Multimedia Tools and Applications

as national census, water conservancy construction, oil exploration, map mapping, railway, and highway location [37, 38, 47]. We usually access remote sensing images of high resolution nowadays, and deep neural networks are gradually being applied to the analysis of it. The deep neural networks achieved satisfactory results when used for classifying the scene [8, 30, 31] and detecting the object [16, 66]. Despite the successful application of deep neural networks in remote sensing images, we can see the existing research usually attaches more importance to the feature level of remote sensing images and failed to capture semantic level information and correlations of different objects in the images which are also crucial to the bett

Data Loading...

Remote sensing image caption generation via transformer and reinforcement learning

Recommend Documents

Natural Answer Generation via Graph Transformer

Manifold Learning Technique for Remote Sensing Image Classification

Boosting image caption generation with feature fusion module

Guided Reinforcement Learning via Sequence Learning

Reinforcement Learning Based Personalized Neural Dialogue Generation

Automatic Curriculum Generation by Hierarchical Reinforcement Learning

Synthetic Sample Selection via Reinforcement Learning

Remote Sensing Image Retrieval Based on Color and Texture

A Novel Remote Sensing Image Classification Scheme Based on Data Fusion, Multiple Features and Ensemble Learning

Remote Sensing Applications and Innovations via Small Satellite Constellations

Agriculture and Remote Sensing

Remote Sensing