The Multilingual Student Translation corpus: a resource for translation teaching and research

  • PDF / 274,829 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 43 Downloads / 263 Views

DOWNLOAD

REPORT


The Multilingual Student Translation corpus: a resource for translation teaching and research Sylviane Granger1



Marie-Aude Lefer1

 Springer Nature B.V. 2020

Abstract The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign language learners or trainee translators collected collaboratively by a large number of partner teams internationally. The corpus represents a prime example of community sourcing, as the data are collected and shared by the members of the MUST network. Two key characteristics of the corpus are that it involves a large number of language pairs and that each text is accompanied by a rich set of standardized metadata related to the source texts, the translation tasks and the students. The web interface on which the corpus is stored allows the data to be aligned and annotated with a purpose-built translation annotation system. The resulting corpus data lend themselves to a range of applications (translator training, materials design, pedagogical lexicography) and can also be used to advance empirical research in corpus-based translation studies. Keywords Learner translation corpus  Metadata  Annotation  Standardization  Community sourcing  Multilingual

1 Introduction Learner corpus research (LCR) and corpus-based translation studies (CBTS) are two research strands that arose at approximately the same time, in the late 80s/early 90s (Granger 1993, 1994; Baker 1993, 1995). Both fields make use of corpus data to inform theory—second language acquisition theory and translation theory, respectively—and to generate more efficient pedagogical applications. LCR relies on

& Sylviane Granger [email protected] 1

Centre for English Corpus Linguistics, University of Louvain, Louvain-la-Neuve, Belgium

123

S. Granger, M.-A. Lefer

learner corpora, i.e. electronic collections of foreign-/second-language learner writing and/or speech, while CBTS relies on comparable and parallel corpora of translated texts. The two focused language varieties—learner language and translated language—have one major characteristic in common: they are ‘‘mediated’’ varieties (Gaspari and Bernardini 2010), i.e. they involve a process of mediation between two languages: the learner’s native (L1) and target (L2) language and the translator’s source (SL) and target (TL) language. As a result, LCR and CBTS have overlapping research agendas, and several researchers have called for a rapprochement between the two fields. An early study by Granger (1996, p. 46) states this explicitly: ‘‘Far from seeing computerized bilingual corpora as the private ground of translation specialists and typologists and computerized learner corpora as the sole concern of applied linguists, we see the two types of corpora as closely interrelated’’. Similarly, Johansson (2007, p. 313) observed that ‘‘[n]ew possibilities are afforded by the combined use of learner corpora and multilingual corpora’’ and Chesterman (2007, p. 63) suggested adding learner texts to the set of reference texts used in tra