Statistical Guideline #1. Avoid Creating Categorical Variables from Continuous Variables

  • PDF / 150,326 Bytes
  • 2 Pages / 595.276 x 790.866 pts Page_size
  • 55 Downloads / 185 Views

DOWNLOAD

REPORT


COMMENTARY

Statistical Guideline #1. Avoid Creating Categorical Variables from Continuous Variables Suzanne C. Segerstrom 1 Published online: 28 May 2019 # International Society of Behavioral Medicine 2019

The Statistics Guru Frequently, editors and reviewers see continuous data artificially categorized for statistical analysis—for example, above or below a clinical cutoff or the sample median. The first statistical guideline for IJBM is to keep continuous variables as continuous variables whenever possible. A continuous variable contains the most amount of information about a data point. Categories, particularly dichotomization, lose much of this information. Consider the example of a normally distributed variable with a mean of 50 and a range 0–100. If I dichotomize the variable at 50, creating 2 categories (Bhigh^ and Blow^), then I essentially claim that two data points with values 1 and 49 are more alike than two data points with values 49 and 51. That claim is almost certainly false. Furthermore, dichotomization leads to underestimates of effect size and loss of measurement reliability [1]. Why does this practice persist? One candidate is the influence of the medical model. Medicine dichotomizes people: you are either hypertensive or not, obese or not, diabetic or not. This practice was sensible when medicine dealt mostly with infectious disease, because

From the Editors: This is the first column from the Statistics Guru. The Statistics Guru will appear in every issue. In these columns, we briefly discuss appropriate ways to analyze and present data in the journal. As such, the Statistics Guru can be seen both as an editorial amuse bouche and a set of guidelines for reporting data in the International Journal of Behavioral Medicine. If you have ideas for a column, please email the Statistical Editor, Suzanne Segerstrom at [email protected]. * Suzanne C. Segerstrom [email protected] 1

Department of Psychology, University of Kentucky, 125 Kastle Hall, Lexington, KY 40506-0044, USA

infected or not is a true dichotomy. (Also, famously, you cannot be a little bit pregnant.) A dichotomous diagnosis does provide clinical guidance on whether to treat or not. However, in research and in statistical models, there is rarely a good reason to treat blood pressure or BMI or blood glucose as dichotomous. The same is true for psychological variables. Even psychiatric disorders may be best conceptualized as continuous [2]. Finally, median, quartile, or other sample-specific splits create the additional problem of idiosyncrasy. Two samples are unlikely to have exactly the same median even when drawn from the same population. Sample-specific splits therefore work against the goal of cumulative science. Perhaps one does want information about a specific level of, for example, scores on the Beck Depression Inventory, which has cutoff scores for minimal, mild, moderate, and severe depression. To test the BDI as an explanatory variable for C-reactive protein (CRP), for example, one could create four categories of BDI scores an