Visualizing the decision rules behind the ROC curves: understanding the classification process

PDF / 15,113,637 Bytes
27 Pages / 439.37 x 666.142 pts Page_size
18 Downloads / 233 Views

Visualizing the decision rules behind the ROC curves: understanding the classification process Sonia Pérez‑Fernández1 · Pablo Martínez‑Camblor2 · Peter Filzmoser3 · Norberto Corral1 Received: 30 September 2019 / Accepted: 14 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract The receiver operating characteristic (ROC) curve is a graphical method commonly used to study the capacity of continuous variables (markers) to properly classify subjects into one of two groups. The decision made is ultimately endorsed by a classification subset on the space where the marker is defined. In this paper, we study graphical representations and propose visual forms to reflect those classification rules giving rise to the construction of the ROC curve. On the one hand, we use static pictures for displaying the classification regions for univariate markers, which are specially convenient when there is not a monotone relationship between the marker and the likelihood of belonging to one group. In those cases, there are two options to improve the classification accuracy: to allow for more flexibility in the classification rules (for example considering two cutoff points instead of one) or to transform the marker by using a function whose resulting ROC curve is optimal. On the other hand, we propose to build videos for visualizing the collection of subsets when several markers are considered simultaneously. A compilation of techniques for finding a rule that maximizes the area under the ROC curve is included, with a focus on linear combinations. We present a tool for the R software which generates those graphics, and we apply it to one real dataset. The R code is provided as Supplementary Material. Keywords Area under the curve · Classification regions · Graphical animations · Multivariate marker · Receiver operating characteristic curve The authors gratefully acknowledge support by the Grants MTM2015-63971-P from the Spanish Ministerio of Economía y Competitividad and by FC-15-GRUPIN14-101 and Severo Ochoa Grant BP16118 from the Principado de Asturias and Grant from Campus of International Excellence of University of Oviedo (the last two ones for Pérez-Fernández). Electronic supplementary material The online version of this article (https://doi.org/10.1007/s1018 2-020-00385-2) contains supplementary material, which is available to authorized users. * Sonia Pérez‑Fernández [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)

S. Pérez‑Fernández et al.

1 Introduction As a supervised learning technique, classification is a statistical method whose final objective is to build a grouping rule based on one or various markers collected in a training dataset where the response variable is also known. With that rule, classifications of new subjects can be done on the basis of their marker values (Nielsen et al. 2009). Going into good classifications is important in many fields such as medical diagnosis, machine learning, data mining or

Data Loading...

Visualizing the decision rules behind the ROC curves: understanding the classification process

Recommend Documents

Visualizing IT Budget to Improve Stakeholder Communication in the Decision Making Process

Decision Rules

Decision Rules

Troubled Waters Understanding the Science Behind our Coastal Crisis

An Application of MRMC ROC Curves on Radiology

Understanding nanoindentation unloading curves

Strengths and Weaknesses of Three Software Programs for the Comparison of Systems Based on ROC Curves

Recent revisions of the rules of virus classification and nomenclature

On the assessment of software defect prediction models via ROC curves

Behind the Mask

The Story Behind

Visualizing and understanding graph convolutional network