Statistical Guideline #4. Describe the Nature and Extent of Missing Data and Impute Where Possible and Prudent

  • PDF / 165,413 Bytes
  • 2 Pages / 595.276 x 790.866 pts Page_size
  • 96 Downloads / 135 Views

DOWNLOAD

REPORT


INTEGRATIVE REVIEW

Statistical Guideline #4. Describe the Nature and Extent of Missing Data and Impute Where Possible and Prudent Suzanne C. Segerstrom 1

# International Society of Behavioral Medicine 2019

Abstract From the Editors: This is one in a series of statistical guidelines designed to highlight common statistical considerations in behavioral medicine research. The goal is to briefly discuss appropriate ways to analyze and present data in the International Journal of Behavioral Medicine (IJBM). Collectively the series will culminate in a set of basic statistical guidelines to be adopted by IJBM and integrated into the journal’s official Instructions for Authors, but also to serve as an independent resource. If you have ideas for a future topic, please email the Statistical Editor Suzanne Segerstrom at [email protected]. Keywords Missing data . Imputation . Statistical guidelines

The Statistics Guru Unless you are running a simulation study, you are likely to have missing data due to a skipped item or questionnaire page, a scale added after data collection has begun, a study dropout, or equipment failure, for example. The fourth statistical guideline for IJBM is a recommendation for authors to describe the nature and extent of their missing data and to impute missing data (that is, to replace missing data with a feasible value) where imputation is indicated. The canonical question in missing data analysis is, what is the cause of missingness? Data can be missing completely at random (MCAR). For example, equipment might fail, causing a loss of heart rate data. A subset of questionnaires might have been copied incorrectly, leaving out a measure. Because the processes that generated the missing data had nothing to do with the nature of the research participants or their data, MCAR data do not risk biasing the results of analysis. Data can also be missing at random (MAR). For example, older participants might be more likely to drop out of a longitudinal study. In this case, the process that generated the missing data is related to a measured variable in the study. To reduce bias * Suzanne C. Segerstrom [email protected] 1

Department of Psychology, University of Kentucky, 125 Kastle Hall, Lexington, KY 40506-0044, USA

associated with MAR data, data analysis can account for the process by including the measured variable in the model. Data that are not missing at random (NMAR) are the most problematic and yield biased estimates. NMAR data are a function of the data that are missing (e.g., a person with a history of depression leaving questions about psychiatric history blank). Many strategies for handling missing data exist, and both instructional articles [1–3] and book-length treatments are available; a good synopsis of books on missing data can be found at https://thestatsgeek.com/stats-books/missing-databooks/. This guideline cannot summarize all the approaches but suggests some reporting guidelines and possible starting points for handling missing data. Missing Items It is not unusual for a person to s