A roadmap to neural automatic post-editing: an empirical approach
- PDF / 957,677 Bytes
- 30 Pages / 439.37 x 666.142 pts Page_size
- 31 Downloads / 215 Views
A roadmap to neural automatic post‑editing: an empirical approach Dimitar Shterionov1,4 · Félix do Carmo2,4 · Joss Moorkens3 · Murhaf Hossari4 · Joachim Wagner4 · Eric Paquin4 · Dag Schmidtke5 · Declan Groves5 · Andy Way4 Received: 2 April 2019 / Accepted: 22 July 2020 © The Author(s) 2020
Abstract In a translation workflow, machine translation (MT) is almost always followed by a human post-editing step, where the raw MT output is corrected to meet required quality standards. To reduce the number of errors human translators need to correct, automatic post-editing (APE) methods have been developed and deployed in such workflows. With the advances in deep learning, neural APE (NPE) systems have outranked more traditional, statistical, ones. However, the plethora of options, variables and settings, as well as the relation between NPE performance and train/test data makes it difficult to select the most suitable approach for a given use case. In this article, we systematically analyse these different parameters with respect to NPE performance. We build an NPE “roadmap” to trace the different decision points and train a set of systems selecting different options through the roadmap. We also propose a novel approach for APE with data augmentation. We then analyse the performance of 15 of these systems and identify the best ones. In fact, the best systems are the ones that follow the newly-proposed method. The work presented in this article follows from a collaborative project between Microsoft and the ADAPT centre. The data provided by Microsoft originates from phrase-based statistical MT (PBSMT) systems employed in production. All tested NPE systems significantly increase the translation quality, proving the effectiveness of neural post-editing in the context of a commercial translation workflow that leverages PBSMT. Keywords Automatic post-editing · Neural post-editing · Multi-source · Deep learning · Empirical evaluation · Machine Translation
At the time of conducting this work, Dimitar Shterionov and Félix do Carmo wereemployed at the ADAPT Centre, Dublin City University, Dublin, Ireland. * Dimitar Shterionov [email protected] Extended author information available on the last page of the article
13
Vol.:(0123456789)
D. Shterionov et al.
1 Introduction Machine Translation (MT) is widely employed in industrial translation workflows. MT for dissemination is an intermediate step which generates a raw translation of a given source document or a sentence, followed by a post-editing step that ensures that the quality of the final translation meets required quality standards. Automatic Post-editing (APE) is an area of research aiming at exploring methods that apply editing operations on an MT output to produce a better translation and thus reduce the human effort in the translation workflow. APE covers a wide range of post-editing approaches, from regular expressions applied on the MT output to post-editing simple error patterns, to deep learning techniques that can transform complete sentences,
Data Loading...