Overview of Maximum Likelihood Estimation

In ordinary least squares multiple regression, the objective in fitting a model is to find the values of the unknown parameters that minimize the sum of squared errors of prediction. When the response variable is non-normal, polytomous, or not observed co

  • PDF / 636,617 Bytes
  • 37 Pages / 504.581 x 719.997 pts Page_size
  • 26 Downloads / 229 Views

DOWNLOAD

REPORT


Overview of Maximum Likelihood Estimation

9.1 General Notions—Simple Cases In ordinary least squares multiple regression, the objective in fitting a model is to find the values of the unknown parameters that minimize the sum of squared errors of prediction. When the response variable is non-normal, polytomous, or not observed completely, one needs a more general objective function to optimize. Maximum likelihood (ML) estimation is a general technique for estimating parameters and drawing statistical inferences in a variety of situations, especially nonstandard ones. Before laying out the method in general, ML estimation is illustrated with a standard situation, the one-sample binomial problem. Here, independent binary responses are observed and one wishes to draw inferences about an unknown parameter, the probability of an event in a population. Suppose that in a population of individuals, each individual has the same probability P that an event occurs. We could also say that the event has already been observed, so that P is the prevalence of some condition in the population. For each individual, let Y = 1 denote the occurrence of the event and Y = 0 denote nonoccurrence. Then Prob{Y = 1} = P for each individual. Suppose that a random sample of size 3 from the population is drawn and that the first individual had Y = 1, the second had Y = 0, and the third had Y = 1. The respective probabilities of these outcomes are P , 1 − P , and P . The joint probability of observing the independent events Y = 1, 0, 1 is P (1 − P )P = P 2 (1 − P ). Now the value of P is unknown, but we can solve for the value of P that makes the observed data (Y = 1, 0, 1) most likely to have occurred. In this case, the value of P that maximizes P 2 (1 − P ) is P = 2/3. This value for P is the maximum likelihood estimate (MLE ) of the population probability.

© Springer International Publishing Switzerland 2015

F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics, DOI 10.1007/978-3-319-19425-7 9

181

182

9 Overview of Maximum Likelihood Estimation

Let us now study the situation of independent binary trials in general. Let the sample size be n and the observed responses be Y1 , Y2 , . . . , Yn . The joint probability of observing the data is given by L=

n 

P Yi (1 − P )1−Yi .

(9.1)

i=1

Now let s denote the sum of the Y s or the number of times that the event occurred (Yi = 1), that is the number of “successes.” The number of nonoccurrences (“failures”) is n − s. The likelihood of the data can be simplified to (9.2) L = P s (1 − P )n−s . It is easier to work with the log likelihood function, which also has desirable statistical properties. For the one-sample binary response problem, the log likelihood is log L = s log(P ) + (n − s) log(1 − P ). (9.3) The MLE of P is that value of P that maximizes L or log L. Since log L is a smooth function of P , its maximum value can be found by finding the point at which log L has a slope of 0. The slope or first derivative of log L, with respect to P , is U (P ) = ∂ log L/∂P =