A TextCNN and WGAN-gp based deep learning frame for unpaired text style transfer in multimedia services

  • PDF / 1,914,109 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 4 Downloads / 133 Views

DOWNLOAD

REPORT


SPECIAL ISSUE PAPER

A TextCNN and WGAN‑gp based deep learning frame for unpaired text style transfer in multimedia services Mingxuan Hu1 · Min He1   · Wei Su1 · Abdellah Chehri2 Received: 5 July 2020 / Accepted: 28 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract With the rapid growth of big multimedia data, multimedia processing techniques are facing some challenges, such as knowledge understanding, semantic modeling, feature representation, etc. Hence, based on TextCNN and WGAN-gp (improved training of Wasserstein GANs), a deep learning framework is suggested to improve the efficiency of discriminating the specific style features and the style-independent content features in unpaired text style transfer for multimedia services. To redact a sentence with the requested style and preserve the style-independent content, the encoder-decoder framework is usually adopted. However, lacking of same-content sentence pairs with different style for training, some works fail to capture the original content and generate satisfied style properties accurately in the transferred sentences. In this paper, we adopt TextCNN to extract the style features in the transferred sentences, and align the style features with the target style label by the generator (encoder and decoder). Meanwhile, WGAN-gp is utilized subtly to preserve the content features of original sentences. Experiments demonstrate that the performances of our framework on automatic evaluation and human evaluation are much better than the former works. Thus, it provides an effective method for unpaired text style transfer in multimedia services. Keywords  Big multimedia data · TextCNN · WGAN-gp · Unpaired text style transfer · Multimedia services

1 Introduction With the coming of new multimedia era, enormous multimedia data are constantly generated from various sources, such as video surveillance, sensing devices, and massive rich text, and it calls for advanced multimedia processing techniques to process these data more efficiently and precisely. In [1, 2] vehicular ad hoc networks are applied to utilize road information for high-quality services. Edge computing [3] aims * Min He [email protected] Mingxuan Hu [email protected] Wei Su [email protected] Abdellah Chehri [email protected] 1



School of Information Science and Engineering, Yunnan University, Kunming 650091, China



Applied Sciences Department, Université du Québec à Chicoutimi, Chicoutimi, QC G7H 2B1, Canada

2

at real-time processing. Deep learning is explored widely to recognize and extract latent features [4–6]. Among them, unpaired text style transfer has become a hot research to transfer the original text into the demanded style. For example, the baleful (negative) reviews for merchandises should be transferred to be positive ones for businesses, the titles of news should attract the attention of the public, the text for important information should be brief and clear, even the style of the same text should be transferred diversely for different crowds. By utilizing th