The Good, the Bad, the Difficult, and the Easy: Something Wrong with Information Retrieval Evaluation?
TREC-like evaluations do not consider topic ease and difficulty. However, it seems reasonable to reward good effectiveness on difficult topics more than good effectiveness on easy topics, and to penalize bad effectiveness on easy topics more than bad effe
- PDF / 2,151,892 Bytes
- 5 Pages / 430 x 660 pts Page_size
- 99 Downloads / 167 Views
Abstract. TREC-like evaluations do not consider topic ease and difficulty. However, it seems reasonable to reward good effectiveness on difficult topics more than good effectiveness on easy topics, and to penalize bad effectiveness on easy topics more than bad effectiveness on difficult topics. This paper shows how this approach leads to evaluation results that could be more reasonable, and that are different to some extent. I provide a general analysis of this issue, propose a novel framework, and experimentally validate a part of it. Keywords: Evaluation, TREC, topic ease and difficulty.
1
Introduction
As lecturers, when we try to assess a student’s performance during an exam, we distinguish between easy and difficult questions. When we ask easy questions to our students we expect correct answers; therefore, we give a rather mild positive evaluation if the answer to an easy question is correct, and we give a rather strong negative evaluation if the answer is wrong. Conversely, when we ask difficult questions, we are quite keen to presume a wrong answer; therefore, we give a rather mild negative evaluation if the answer to a difficult question is wrong, and we give a rather strong positive evaluation if the answer is correct. The difficulty amount of a question can be determined a priori (on the basis of lecturer’s knowledge of what and how has been taught to the students) or a posteriori (e.g., by averaging, in a written exam, the answer evaluations of all the students to the same question). Probably, a mixed approach (both a priori and a posteriori) is the most common choice. During oral examinations, when we have an idea of student’s preparation (e.g., because of a previous written exam, or a term project, or after having asked the first questions), we even do something more: we ask difficult questions to good students, and we ask easy questions to bad students. This sounds quite obvious too: what’s the point in asking easy questions to good students? They will almost C. Macdonald et al. (Eds.): ECIR 2008, LNCS 4956, pp. 642–646, 2008. c Springer-Verlag Berlin Heidelberg 2008
The Good, the Bad, the Difficult, and the Easy
643
certainly answer correctly, as expected, without providing much information about their preparation. And what’s the point in asking difficult questions to bad students? They will almost certainly answer wrongly, without providing much information — and incidentally increase examiner’s stress level. Therefore we can state the following principles, as “procedures” to be followed during student’s assessment: Easy and Difficult Principle. Weight more (less) both (i) errors on easy (difficult) questions and (ii) correct answers on difficult (easy) questions. Good and Bad Principle. On the basis of an estimate of student’s preparation, ask (i) difficult questions to good students and (ii) easy questions to bad students. I am not aware of any lecturer/teacher/examiner which would not agree with the two principles, and which would not behave accordingly, once enlightened by them. In Information Retrieval (IR) evaluation we a
Data Loading...