On the definition of a concentration function relevant to the ROC curve

PDF / 241,543 Bytes
7 Pages / 439.37 x 666.142 pts Page_size
102 Downloads / 213 Views

On the definition of a concentration function relevant to the ROC curve Mauro Gasparini1

· Lidia Sacchetto2

Received: 31 December 2019 / Accepted: 7 October 2020 © The Author(s) 2020

Abstract This work provides a definition of concentration curve alternative to the one presented on this journal by Schechtman and Schechtman (Metron 77:171–178, 2019). Our definition clarifies, at the population level, the relationship between concentration and the omnipresent ROC curve in diagnostic and classification problems. Keywords Likelihood ratio · Lorenz curve · Length-Biased · Gini

1 A critical appraisal of a paper by E. Schechtman and G. Schechtman In a paper appeared recently on this journal Schechtman and Schechtman [6] try to shed some light on the relationship between the Gini Mean Difference (Gini), the Gini Covariance (coGini), the Lorenz curve, the Receiver Operating Characteristic (ROC) curve and a particular definition of concentration function. The purpose of the paper is commendable, since there is a lot of confusion regarding the various relationships among these concepts. In particular, we agree that the ROC curve and its functions (such as the Area Under the Curve, AUC), as well as an appropriate definition of relative concentration of a probability distribution with respect to another, are bivariate objects tying together two different distributions, and can not be reduced to univariate indices such as the Gini. Schechtman and Schechtman [6] build on the wealth of research reviewed in the monograph by Yitzhaki and Schechtman [7], where a whole technology based on the Gini and the co-Gini are proposed as basic tools to study variability, correlation, regression and the like. In particular, the authors try to use certain conditional expectations to establish the connection between concentration and ROC. In this note, we claim their approach is not justified in the

B

Mauro Gasparini [email protected] Lidia Sacchetto [email protected]

1

Department of Mathematical Sciences “G.L. Lagrange”, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10124 Torino, Italy

2

Department of Mathematical Sciences “G.L. Lagrange”, Politecnico di Torino and Università di Torino, Corso Duca degli Abruzzi 24, 10124 Torino, Italy

123

M. Gasparini et al.

diagnostic (classification) setup, where ROC curves typically arise, and propose an alternative simpler connection between concentration and ROC curves based on first principles, namely the likelihood ratio and the application of the Neyman-Pearson lemma. Studying how jointly distributed random variables interrelate is a very fundamental problem in Statistics and its applications to Economics and the Sciences. However, when turning to the diagnostic (or classification) setup, one typically observes one or more diagnostic variables (called features in the Machine Learning literature) from two populations and try to set up a rule that discriminates between them. Some special requirements can then be identified: 1. Two probability distributions should be eval

Data Loading...

On the definition of a concentration function relevant to the ROC curve

Recommend Documents

ROC Curve

Biomarker assessment in ROC curve analysis using the length of the curve as an index of diagnostic accuracy: the binorma

A Review of Methods and Applications of the ROC Curve in Clinical Trials

Adjusting ROC Curve for Covariates with AROC R Package

ROC Curve in GAMLSS as Prediction Tool for Big Data

On the Definition of Microhardness

Regulation of human endometrial function: mechanisms relevant to uterine bleeding

The Income-Demand Curve: Implicit Function and Data Analysis Methods

ROC

The Limits of Definition

Comparative Analysis of Genomic Personalized Cancer Diagnosis by Machine Learning Approaches ROC Curve

On Definition of a Formal Model for IEC 61499 Function Blocks