Cross-domain personalized image captioning

  • PDF / 3,070,296 Bytes
  • 16 Pages / 439.642 x 666.49 pts Page_size
  • 50 Downloads / 196 Views

DOWNLOAD

REPORT


Cross-domain personalized image captioning Cuirong Long1 · Xiaoshan Yang2,3 · Changsheng Xu1,2,3 Received: 31 May 2018 / Revised: 22 December 2018 / Accepted: 27 February 2019 / © Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract Image captioning aims to translate an image to a complete and natural sentence. It involves both computer vision and natural language processing. Though image captioning has achieved good results under the rapid development of deep neural networks, excessively pursuing the evaluation results of the captioning models makes the generated text description too conservative in practical applications. It is necessary to increase the diversity of the text description and account for prior knowledge such as the user’s favorite vocabularies and writing styles. In this paper, we study the personalized image captioning which can generate sentences to describe the user’s own story and feelings of life with the most preferred word expression. Moreover, we propose cross-domain personalized image captioning (CDPIC) to learn domain-invariant captioning models which can be applied on different social media platforms. The proposed method can flexibly model user interest by embedding the user ID as an interest vector. To the best of our knowledge, we propose the first cross-domain personalized image captioning approach by combining the user interest modeling and a simple and effective domain-invariant constraint. The effectiveness of the proposed method is verified on datasets from the Instagram and Lookbook platforms. Keywords Personalization · Image captioning · Domain adaptation

 Xiaoshan Yang

[email protected] Cuirong Long [email protected] Changsheng Xu [email protected] 1

HeFei University of Technology, Hefei, China

2

Institute of Automation, Chinese Academy of Sciences, Beijing, China

3

University of Chinese Academy of Sciences, Beijing, China

Multimedia Tools and Applications

1 Introduction Understanding the visual content of images is a fundamental challenge of computer vision and multimedia for decades. Previous researchers mainly focused on image classification, object detection and semantic segmentation. These tasks aim to recognize or detect a predefined yet limited set of object or scene classes. Recently, a more challenging task of translating an image to a complete and natural sentence has attracted considerable attention. This task involves both computer vision and natural language processing and is called image captioning. A generated sentence must capture not only the object classes contained in an image, but also how these objects relate to each other. Though image captioning is very challenging, it could have great impact, for instance by helping children or visually impaired people better understand the content of images on the web. At present, image captioning has achieved good results under the rapid development of deep neural networks. Most of the image captioning methods are inspired from the sequence-to-sequence model in ma

Data Loading...