## Saturday, June 28, 2014

### Linear Probability Models for Skewed Distributions with High Mass Points

There are a lot of methods discussed in the literature related to modeling skewed distributions with high mass points including log transformations, two part models,  GLM etc. In some previous posts I have discussed linear probability models in the context of causal inference.  I've also discussed the use of quantile regression as a strategy to model highly skewed continuous and count data. Mullahy (2009) alludes to the use of quantile regression as well:

"Such concerns should translate into empirical strategies that target the high-end parameters of particular interest, e.g. models for Prob(y ≥ k | x) or quantile regression models"

The focus on high end parameters  using linear probability models is mentioned in Angrist and Pischke (2009) :

"COP [conditional-on-positive] effects are sometimes motivated by a researcher's sense that when the outcome distribution has a mass point-that is, when it piles up on a particular value, such as zero-or has a heavily skewed distribution, or both, then an analysis of effects on averages misses something. Analysis of effects on averages indeed miss some things, such as changes in the probability of specific values or a shift in quantiles away from the median. But why not look at these distribution effects directly? Distribution outcomes include the likelihood that annual medical expenditures exceed zero, 100 dollars, 200 dollars, and so on. In other words, put 1[Yi > c] for different choices of c on the left hand side of the regression of interest...the idea of looking directly at distribution effects with linear probability models is illustrated by Angrist (2001),...Alternatively, if quantiles provide a focal point, we can use quantile regressions to model them."

References:

Mostly Harmless Econometrics. Angrist and Pischke. 2009

Angrist, J.D. Estimation of Limited Dependent Variable Models With Dummy Endogenous Regressors: Simple Strategies for Empirical Practice. Journal of Business & Economic Statistics January 2001, Vol. 19, No. 1.

ECONOMETRIC MODELING OF HEALTH CARE COSTS AND EXPENDITURES: A SURVEY OF ANALTICAL ISSUES AND RELATED POLICY CONSIDERATIONS