## Friday, May 31, 2013

### Selection Bias and the Rubin Causal Model and Potential Outcomes Framework

The problem of selection bias is best characterized within the Rubin Causal Model or potential outcomes framework (Angrist and Pischke,2008; Rubin, 1974; Imbens and Wooldridge, 2009, Klaiber & Smith,2009)

Suppose Yi is the measured outcome of interest. This can be written in terms of potential outcomes as:

Yi = { y1i if  di =1 ;y0i, if di= 0}

=  y0i + (y1i- y0i)di

The causal effect of interest is y1i- y0i, but is unobservable because we don’t see both outcomes for a single individual. Reality forces us to compare outcomes for different individuals (those treated vs. untreated).

Let di= choice or selection or treatment
Y0i= baseline potential outcome
Y1i = potential treatment outcome

What we actually measure is E[Y­­­i|di=1] - E[Y­­­i|di=0], the observed effect or observed difference between means for treated vs. untreated groups . The problem of non-random treatment selection can be characterized as follows:

E[Y­­­i|di=1] - E[Y­­­i|di=0] =E[Y1i-Y0i]  +{E[Y0i|di=1] - E[Y0i|di=0]}

The observed effect or difference is equal to the population average treatment effect  (ATE) E[Y1i-Y0i]  in addition to the bracketed term for selection bias. If the potential outcomes ‘Y0i’ for those that are treated (di=1) differ from potential outcomes ‘Y0i’ from those that are not treated or don’t self-select(di=0), then the term {E [Y0i|di=1] - E [Y0i|di=0]} could have a positive or negative value, creating selection bias. When we calculate the observed difference between treated and untreated groups  selection bias becomes confounded with the actual treatment effect E[Y1i-Y0i].   Note, if the potential outcomes of the treated and control groups were the same, then the selection bias term would equal zero, and the observed difference would represent the population average treatment effect.

If the term  { E[Y0i|di=1] - E[Y0i|di=0]} representing section bias is large enough, it can overpower the actual treatment effect and leave the naïve researcher to conclude (based on the observed effect E[Y­­­i|di=1] - E[Y­­­i|di=0] ) that the intervention  or treatment was ineffectual or lead them to under or overestimate the true treatment effects depending on the direction of the bias.

References:
Rubin, D. B.(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, Vol 66(5), Oct 1974, 688-701

Angrist, J. D. &  Pischke J. (2008). Mostly harmless econometrics: An empiricist's companion. Princeton University Press.

Imbens, G. W. & Wooldridge, J.M.(2009). Recent developments in the econometrics of program  evaluation. Journal of Economic Literature, 47:1, 5–86

Klaiber, H.A. & Smith,V.K. (2009). Evaluating Rubin's causal model for measuring the capitalization of    environmental amenities.  NBER Working Paper No 14957. National Bureau of Economic  Research.