Missing Data

There are missing data in the majority of datasets one is likely to encounter. Before discussing some of the problems of analyzing data in which some variables are missing for some subjects, we define some nomenclature.

  • PDF / 215,245 Bytes
  • 17 Pages / 504.581 x 719.997 pts Page_size
  • 101 Downloads / 260 Views

DOWNLOAD

REPORT


Missing Data

3.1 Types of Missing Data There are missing data in the majority of datasets one is likely to encounter. Before discussing some of the problems of analyzing data in which some variables are missing for some subjects, we define some nomenclature.

Missing completely at random (MCAR) Data are missing for reasons that are unrelated to any characteristics or responses for the subject, including the value of the missing value, were it to be known. Examples include missing laboratory measurements because of a dropped test tube (if it was not dropped because of knowledge of any measurements), a study that ran out of funds before some subjects could return for follow-up visits, and a survey in which a subject omitted her response to a question for reasons unrelated to the response she would have made or to any other of her characteristics.

Missing at random (MAR) Data are not missing at random, but the probability that a value is missing depends on values of variables that were actually measured. As an example, consider a survey in which females are less likely to provide their personal income in general (but the likelihood of responding is independent of her actual income). If we know the sex of every subject and have income levels for some of the females, unbiased sex-specific income estimates can be made. That is because the incomes we do have for some of the females are a random sample of all females’ incomes. Another way of saying that a variable is MAR

© Springer International Publishing Switzerland 2015

F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics, DOI 10.1007/978-3-319-19425-7 3

45

1

46

3 Missing Data

is that given the values of other available variables, subjects having missing values are only randomly different from other subjects.535 Or to paraphrase Greenland and Finkle,242 for MAR the missingness of a covariable cannot depend on unobserved covariable values; for example whether a predictor is observed cannot depend on another predictor when the latter is missing but it can depend on the latter when it is observed. MAR and MCAR data are also called ignorable non-responses.

Informative missing (IM) The tendency for a variable to be missing is a function of data that are not available, including the case when data tend to be missing if their true values are systematically higher or lower. An example is when subjects with lower income levels or very high incomes are less likely to provide their personal income in an interview. IM is also called nonignorable non-response and missing not at random (MNAR). IM is the most difficult type of missing data to handle. In many cases, there is no fix for IM nor is there a way to use the data to test for the existence of IM. External considerations must dictate the choice of missing data models, and there are few clues for specifying a model under IM. MCAR is the easiest case to handle. Our ability to correctly analyze MAR data depends on the availability of other variables (the sex of the subject in the example above). Mo