Hypothesis Testing when Data Sources are Uncertain

  • PDF / 1,232,171 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 59 Downloads / 215 Views

DOWNLOAD

REPORT


Hypothesis Testing when Data Sources are Uncertain Alaa Elkadry1   · Gary C. McDonald2 Accepted: 25 September 2020 © Grace Scientific Publishing 2020

Abstract The Justice Department and the Consumer Financial Protection Bureau used BISG to find Ally Financial and Ally Bank guilty of bias in auto loans. Federal authorities claim to use BISG tools in their investigations with no details on how this tool is being used. In this article, we develop different ways to use the BISG in order to test for disparity. This article offers methods that use the highest probability provided by the BISG as well as a method that uses all the probabilities coming from BISG. Keywords  Surname analysis · Geocoding analysis · BISG · Disparity · p Values

1 Introduction Cases where the source of data is uncertain are common. Randomized response techniques [1, 6, 12, 14] are used to deal with such data. Analyses tools for randomized response techniques are well developed and are not to be discussed in this article. In a recent study, we considered a randomized response model for continuous data and the methodology was discussed in [3, 4]. This article will focus on another example of data with uncertain source such as the data arising from programs used to determine race/ethnicity. Some of those programs are used to test for discrimination in financial institutions. Current regulations give the Consumer Financial Protection Bureau (CFPB) the power to order a company to pay an unnegotiable fine. This article does not discuss the politics behind the CFPB power but it aims to give the readers an idea on how to use a probability distribution of race/ethnicity provided by some developed techniques to test whether there is any disparity between races. Some publicly available data like Census were used to develop programs that can estimate the probability of an individual being * Alaa Elkadry [email protected] Gary C. McDonald [email protected] 1

Department of Mathematics and Statistics, Marshall University, Huntington, WV 27505, USA

2

Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309, USA



13

Vol.:(0123456789)

66  

Page 2 of 17

Journal of Statistical Theory and Practice

(2020) 14:66

from a specific race/ethnicity group. Such programs use information like last name or address to generate estimations on the ethnicity of an individual in the USA; for example, someone can input a last name and a zip code to get in return the probabilities of an individual being from specific races. CFPB claims to use such programs to identify disparity in loans given by some companies. CFPB does not provide any details on how such programs are being used. In this article, we do not try to compare the different programs or study them, and this article aims to develop some methods that use these types of programs (the data generated from such programs) to identify disparity so the reader can better understand how such methods work and how they can be used for statistical inference. Specifically, this article provides