The problem of selection bias is best characterized within the Rubin Causal Model or potential outcomes framework (Angrist and Pischke,2008; Rubin, 1974; Imbens and Wooldridge, 2009, Klaiber & Smith,2009)

Suppose Y

_{i}is the measured outcome of interest. This can be written in terms of potential outcomes as:
Y

_{i}= { y_{1i}if d_{i =1};y_{0i}, if d_{i}= 0}_{}
= y

_{0i }+ (y_{1i}- y_{0i})d_{i}
The causal effect of interest is y

_{1i}- y_{0i}, but is unobservable because we don’t see both outcomes for a single individual. Reality forces us to compare outcomes for different individuals (those treated vs. untreated).
Let d

_{i}= choice or selection or treatment
Y

_{0i}= baseline potential outcome
Y

_{1i}= potential treatment outcome
What we actually measure is E[Y

_{i}|d_{i}=1] - E[Y_{i}|d_{i}=0], the observed effect or observed difference between means for treated vs. untreated groups . The problem of non-random treatment selection can be characterized as follows:
E[Y

_{i}|d_{i}=1] - E[Y_{i}|d_{i}=0] =E[Y_{1i}-Y_{0i}] +{E[Y_{0i}|d_{i}=1] - E[Y_{0i}|d_{i}=0]}
The observed effect or difference is equal to the population
average treatment effect (ATE) E[Y

_{1i}-Y_{0i}] in addition to the bracketed term for selection bias. If the potential outcomes ‘Y_{0i}’ for those that are treated (d_{i}=1) differ from potential outcomes ‘Y_{0i}’ from those that are not treated or don’t self-select(d_{i}=0), then the term {E [Y_{0i}|d_{i}=1] - E [Y_{0i}|d_{i}=0]} could have a positive or negative value, creating selection bias. When we calculate the observed difference between treated and untreated groups selection*bias*becomes confounded with the actual treatment effect E[Y_{1i}-Y_{0i}]. Note, if the potential outcomes of the treated and control groups were the same, then the selection bias term would equal zero, and the observed difference would represent the population average treatment effect.
If the term { E[Y

_{0i}|_{di}=1] - E[Y_{0i}|d_{i}=0]} representing section bias is large enough, it can overpower the actual treatment effect and leave the naïve researcher to conclude (based on the observed effect E[Y_{i}|d_{i}=1] - E[Y_{i}|d_{i}=0] ) that the intervention or treatment was ineffectual or lead them to under or overestimate the true treatment effects depending on the direction of the bias.**References:**

Rubin, D. B.(1974). Estimating causal effects of treatments in
randomized and nonrandomized studies.

*Journal of Educational Psychology,*Vol 66(5), Oct 1974, 688-701
Angrist, J. D. & Pischke J.
(2008).

*Mostly harmless econometrics: An empiricist's companion*. Princeton University Press.
Imbens, G. W. & Wooldridge, J.M.(2009). Recent developments in the
econometrics of program
evaluation.

*Journal of Economic Literature*, 47:1, 5–86
Klaiber, H.A. & Smith,V.K. (2009

*). Evaluating Rubin's causal model for measuring the capitalization of environmental amenities.*NBER Working Paper No 14957. National Bureau of Economic Research.
## No comments:

## Post a Comment

Note: Only a member of this blog may post a comment.