Improving machine translation output of German compound and multiword financial terms: comparison with cross-linguistic

  • PDF / 485,423 Bytes
  • 6 Pages / 595.276 x 790.866 pts Page_size
  • 60 Downloads / 172 Views

DOWNLOAD

REPORT


RESEARCH ARTICLE

Improving machine translation output of German compound and multiword financial terms: comparison with cross-linguistic data Christina Valavani 1

&

Christina Alexandris 1 & George Mikros 1

Received: 26 March 2020 / Accepted: 17 September 2020 # Springer Nature Switzerland AG 2020

Abstract The present application translates German financial terms to the Greek language aiming to maximize the translation quality. The demand for better translation increases the need for more MT systems; journalists or professionals in the banking sector require accurate translations. A novel statistical method of MT is presented, where we use a probability distribution over sentence pairs from a parallel corpus (German-Greek). Both the MT system and the parallel corpus were built from scratch. The present application combines parameters in a new way and estimates the probabilities that extract the best translation. Keywords Statistical machine translation system . Financial terms . German to Greek . Translation application . Noun phrases . Java . Translation models . Alignment models

1 Introduction The proposed application focuses on the translation of complex economic structures from German into Greek. It concerns statistically based machine translation using a parallel text corpus (German-Greek) (Koutsis et al. 2005).The corpus consists mainly of financial texts and respective terminology (Sager 1990) and it is used for the optimal translation based on probability. The application is simple to use and it functions by entering the term for translation and getting the most likely translation or translations. The application is intended to improve translation results from existing machine translation systems such as Google Translate. Despite recent advances in machine translation, including systems using neural networks, it is observed that for some types of input, such as compound and multiword financial terms, the integration of special parameters and processing strategies is necessary. The present approach is based on alignment models and their integration in statistical models. In detail, by referring to complex economic structures, we mean German compounds and compound phrases like the following:

* Christina Valavani [email protected] 1

National Kapodistrian University of Athens, Athens, Greece

1 (2-g model) & & & &

Euro-Nachbarländer Brutto-Außenumsatz Festgeld-Konto Eurogruppen-Chef

2 (3- and 4-g model) & & &

Reise und Konto-Sperren Einzelhandels und Medienaktien EU-Staats und Regierungschefs

2 Alignment models The financial terms of the German-Greek text corpus are processed in respect to their distinctive linguistic features and morphosyntactic structures, especially if compound multiword terms are concerned. Processing is linked to the identification of the distinctive types of linguistic features and morphosyntactic structures of German financial terms derived from previous research (Valavani 2019).The distinctive types of linguistic features and morphosyntactic structures of German financial