Using Corpora for Language Assessment
- PDF / 140,203 Bytes
- 14 Pages / 439.37 x 663.307 pts Page_size
- 101 Downloads / 251 Views
USING CORPORA FOR LANGUAGE ASSESSMENT
INTRODUCTION
Since the early 1990s the term ‘corpus’ has been used to refer to a large collection of texts stored in a computer database which can be subjected to various types of linguistic analysis (see Stubbs, 2004). Language corpora containing millions of running words sampled from hundreds or thousands of written or spoken texts are still used for largescale lexicographic research, although these are now balanced by an increasing number of smaller, more specialised corpora which are used for qualitative research. Corpus content may be selected according to specific sociolinguistic or text-type parameters (e.g., Cambridge Learner Corpus, see Barker, 2004), or it may aim to capture a broad and balanced language sample (e.g., COBUILD Bank of English, see Renouf, 1987). An essential feature of most modern corpora, whatever their size, is that they are computer-readable using specially designed software programs, such as concordancers; this allows linguistic features in the data to be identified, sorted and analysed. More sophisticated analyses are possible if a corpus has been annotated with additional linguistic information, e.g., when the content has been POS-tagged or syntactically parsed (e.g., International Corpus of English, see Greenbaum, 1996). The 1997 Encyclopedia of Language and Education appeared to include no mention of the use of corpora in language education generally, let alone in testing and assessment. This suggests that in the early 1990s language corpora and corpus linguistic tools still played a relatively minor role in pedagogy and assessment despite research conducted in the fields of language and education, e.g., the work of the CHILDES (Child Language Data Exchange System) Project—an extensive speech database with accompanying tools for researching first/ second (L1/L2) language acquisition and other areas (see MacWhinney, 1996), and outcomes from international organisations such as ICAME, the International Computer Archive of Modern and Medieval English at the University of Bergen, Norway. ICAME has collected and distributed corpora and software since the 1970s and shares research findings through its website, journal and annual conferences.
E. Shohamy and N. H. Hornberger (eds), Encyclopedia of Language and Education, 2nd Edition, Volume 7: Language Testing and Assessment, 241–254. #2008 Springer Science+Business Media LLC.
242
LY N D A T A Y L O R A N D F I O N A B A R K E R E A R LY D E V E L O P M E N T S
Small-scale, computer-readable corpora existed from the 1960s onwards (e.g., Brown University Corpus for American English (Francis and Kučera, 1964), Lancaster-Oslo/Bergen (LOB) Corpus for British English (Johansson, Leech and Goodluck, 1978)), though there is little evidence that these directly influenced language assessment. Given their original design purpose, this is perhaps not surprising. Both these corpora contain 1 million words from 15 genres of edited English prose including press reportage, biography, memoirs and fiction from 1961 and
Data Loading...