David K. Park, and Andrew Gelman

American Statistical Association, November 2008

Abstract

A linear regression of y on x can be approximated by a simple difference: the average values of y corresponding to the highest quarter or third of x, minus the average values of y corresponding to the lowest quarter or third of x. A simple theoretical analysis, similar to analyses that have been done in psychometrics, shows this comparison to perform reasonably well, with 80%– 90% efficiency compared to the regression if the predictor is uniformly or normally distributed. By discretizing x into three categories, we claw back about half the efficiency lost by the commonly used strategy of dichotomizing the predictor. We illustrate with the example that motivated our research: an analysis of income and voting which we had originally performed for a scholarly journal but then wanted to communicate to a general audience.

View the paper here: Splitting a Predictor at the Upper Quarter or Third and the Lower Quarter or Third

CU Global Thought

The Committee on Global Thought. 91 Claremont, Suite 513, New York, NY 10027. (212) 851-7293

David K. Park, and Andrew Gelman

American Statistical Association, November 2008

Abstract