Enhanced Neural Machine Translation by Joint Decoding with Word and POS-tagging Sequences

  • PDF / 720,756 Bytes
  • 7 Pages / 595.224 x 790.955 pts Page_size
  • 23 Downloads / 225 Views

DOWNLOAD

REPORT


Enhanced Neural Machine Translation by Joint Decoding with Word and POS-tagging Sequences Xiaocheng Feng1 · Zhangyin Feng1 · Wanlong Zhao2

· Bing Qin1 · Ting Liu1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Machine translation has become an irreplaceable application in the use of mobile phones. However, the current mainstream neural machine translation models depend on continuously increasing the amount of parameters to achieve better performance, which is not applicable to the mobile phone. In this paper, we improve the performance of neural machine translation (NMT) with shallow syntax (e.g., POS tag) of target language, which has better accuracy and latency than deep syntax such as dependency parsing. In particular, our models take less parameters and runtime than other complex machine translation models, making mobile applications possible. In detail, we present three RNN-based NMT decoding models (independent decoder, gates shared decoder and fully shared decoder) to jointly predict target word and POS tag sequences. Experiments on Chinese-English and German-English translation tasks show that the fully shared decoder can acquire the best performance, which increases the BLEU score by 1.4 and 2.25 points respectively compared with the attention-based NMT model. In addition, we extend the idea to transformer-based models, and the experimental results also show that the BLEU score is further improved. Keywords Joint decoding · POS-tagging · Neural machine translation · Natural language processing · Artificial intelligence

1 Introduction Neural Machine Translation (NMT) plays an important role in current natural language processing (NLP) community and its performance is usually used as a metric to evaluate the development of artificial intelligence [1]. Recently, deep structure representations (e.g., dependence) are applied to NMT tasks as external features in both encoding and decoding sides, and new architectures have achieved impressive results in translation quality of many language pairs [2–4]. However, those machine translation models combined with complex information or Multi-pass Decoder [5] will greatly increase the number of model parameters, which greatly affects its speed on the micro processor, such as mobile phone, and the error propagation will affect the  Wanlong Zhao

[email protected] 1

Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin 150001, China

2

Acoustic Science and Technology Laboratory, Harbin Engineering University, Harbin 150001, China

machine translation due to the complexity and instability of deep syntax. Compared to deep syntax, we favor to shallow structures (e.g., POS tag and chunk) in this work, which have higher accuracy and faster analyzers. We believe that the performance of an NMT system would benefit from POS tag information of target language. Implicit patterns of target language (e.g. word order) could be revealed from the POS tag sequence. For instance, a Chinese POS tagger