Dimensions and Metrics for Evaluating Recommendation Systems

Recommendation systems support users and developers of various computer and software systems to overcome information overload, perform information discovery tasks, and approximate computation, among others. They have recently become popular and have attra

  • PDF / 470,344 Bytes
  • 29 Pages / 439.36 x 666.15 pts Page_size
  • 68 Downloads / 218 Views

DOWNLOAD

REPORT


Dimensions and Metrics for Evaluating Recommendation Systems Iman Avazpour, Teerat Pitakrat, Lars Grunske, and John Grundy

Abstract Recommendation systems support users and developers of various computer and software systems to overcome information overload, perform information discovery tasks, and approximate computation, among others. They have recently become popular and have attracted a wide variety of application scenarios ranging from business process modeling to source code manipulation. Due to this wide variety of application domains, different approaches and metrics have been adopted for their evaluation. In this chapter, we review a range of evaluation metrics and measures as well as some approaches used for evaluating recommendation systems. The metrics presented in this chapter are grouped under sixteen different dimensions, e.g., correctness, novelty, coverage. We review these metrics according to the dimensions to which they correspond. A brief overview of approaches to comprehensive evaluation using collections of recommendation system dimensions and associated metrics is presented. We also provide suggestions for key future research and practice directions.

10.1 Introduction Due to the complexity of today’s software systems, modern software development environments provide recommendation systems for various tasks. These ease the developers’ decisions or warn them about the implications of their decisions. Examples are code completion, refactoring support, or enhanced search capabilities

I. Avazpour () • J. Grundy () Faculty of ICT, Centre for Computing and Engineering Software and Systems (SUCCESS), Swinburne University of Technology, Hawthorn, Australia e-mail: [email protected]; [email protected] T. Pitakrat () • L. Grunske () Institute of Software Technology, Universität Stuttgart, Stuttgart, Germany e-mail: [email protected]; [email protected] M.P. Robillard et al. (eds.), Recommendation Systems in Software Engineering, DOI 10.1007/978-3-642-45135-5__10, © Springer-Verlag Berlin Heidelberg 2014

245

246

I. Avazpour et al.

during specific maintenance activities. In recent years, research has produced a variety of these recommendation systems and some of them have similar intentions and functionalities [24, 60]. One obvious question is, therefore, how can we assess quality and how can we benchmark different recommendation systems? In this chapter, we provide a practical guide to the commonly used quantitative evaluation techniques used to compare recommendation systems. As a first step, we have identified a set of dimensions, e.g., the correctness or diversity of the results that may serve as a basis for an evaluation of a recommendation system. The different dimensions will be explained in detail and different metrics are presented to measure and quantify each dimension. Furthermore, we explore interrelationships between dimensions and present a guide showing how to use the dimensions in an individual recommendation system validati