On the definition of a concentration function relevant to the ROC curve

  • PDF / 241,543 Bytes
  • 7 Pages / 439.37 x 666.142 pts Page_size
  • 102 Downloads / 172 Views

DOWNLOAD

REPORT


On the definition of a concentration function relevant to the ROC curve Mauro Gasparini1

· Lidia Sacchetto2

Received: 31 December 2019 / Accepted: 7 October 2020 © The Author(s) 2020

Abstract This work provides a definition of concentration curve alternative to the one presented on this journal by Schechtman and Schechtman (Metron 77:171–178, 2019). Our definition clarifies, at the population level, the relationship between concentration and the omnipresent ROC curve in diagnostic and classification problems. Keywords Likelihood ratio · Lorenz curve · Length-Biased · Gini

1 A critical appraisal of a paper by E. Schechtman and G. Schechtman In a paper appeared recently on this journal Schechtman and Schechtman [6] try to shed some light on the relationship between the Gini Mean Difference (Gini), the Gini Covariance (coGini), the Lorenz curve, the Receiver Operating Characteristic (ROC) curve and a particular definition of concentration function. The purpose of the paper is commendable, since there is a lot of confusion regarding the various relationships among these concepts. In particular, we agree that the ROC curve and its functions (such as the Area Under the Curve, AUC), as well as an appropriate definition of relative concentration of a probability distribution with respect to another, are bivariate objects tying together two different distributions, and can not be reduced to univariate indices such as the Gini. Schechtman and Schechtman [6] build on the wealth of research reviewed in the monograph by Yitzhaki and Schechtman [7], where a whole technology based on the Gini and the co-Gini are proposed as basic tools to study variability, correlation, regression and the like. In particular, the authors try to use certain conditional expectations to establish the connection between concentration and ROC. In this note, we claim their approach is not justified in the

B

Mauro Gasparini [email protected] Lidia Sacchetto [email protected]

1

Department of Mathematical Sciences “G.L. Lagrange”, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10124 Torino, Italy

2

Department of Mathematical Sciences “G.L. Lagrange”, Politecnico di Torino and Università di Torino, Corso Duca degli Abruzzi 24, 10124 Torino, Italy

123

M. Gasparini et al.

diagnostic (classification) setup, where ROC curves typically arise, and propose an alternative simpler connection between concentration and ROC curves based on first principles, namely the likelihood ratio and the application of the Neyman-Pearson lemma. Studying how jointly distributed random variables interrelate is a very fundamental problem in Statistics and its applications to Economics and the Sciences. However, when turning to the diagnostic (or classification) setup, one typically observes one or more diagnostic variables (called features in the Machine Learning literature) from two populations and try to set up a rule that discriminates between them. Some special requirements can then be identified: 1. Two probability distributions should be eval