The problem of selection bias is best characterized within the Rubin Causal Model or potential outcomes framework (Angrist and Pischke,2008; Rubin, 1974; Imbens and Wooldridge, 2009, Klaiber & Smith,2009)
Suppose Yi is the measured outcome of interest. This can be
written in terms of potential outcomes as:
Yi = { y1i if di =1 ;y0i, if di= 0}
= y0i + (y1i- y0i)di
The causal effect of interest is y1i- y0i, but is
unobservable because we don’t see both outcomes for a single individual.
Reality forces us to compare outcomes for different individuals (those treated
vs. untreated).
Let di= choice or selection or
treatment
Y0i= baseline potential outcome
Y1i = potential treatment outcome
What we actually measure is E[Yi|di=1] - E[Yi|di=0],
the observed effect or observed difference between means for treated vs.
untreated groups . The problem of non-random treatment selection can be
characterized as follows:
E[Yi|di=1] - E[Yi|di=0]
=E[Y1i-Y0i] +{E[Y0i|di=1]
- E[Y0i|di=0]}
The observed effect or difference is equal to the population
average treatment effect (ATE) E[Y1i-Y0i] in addition to the bracketed term for
selection bias. If the potential outcomes ‘Y0i’ for those
that are treated (di=1) differ from potential outcomes ‘Y0i’
from those that are not treated or don’t self-select(di=0), then the
term {E [Y0i|di=1] - E [Y0i|di=0]}
could have a positive or negative value, creating selection bias. When we
calculate the observed difference between treated and untreated groups selection
bias becomes confounded with the actual treatment effect E[Y1i-Y0i]. Note, if the potential outcomes of the
treated and control groups were the same, then the selection bias term would
equal zero, and the observed difference would represent the population average
treatment effect.
If the term { E[Y0i|di=1]
- E[Y0i|di=0]} representing section bias is large enough,
it can overpower the actual treatment effect and leave the naïve researcher to
conclude (based on the observed effect E[Yi|di=1] - E[Yi|di=0]
) that the intervention or
treatment was ineffectual or lead them to under or overestimate the true
treatment effects depending on the direction of the bias.
References:
Rubin, D. B.(1974). Estimating causal effects of treatments in
randomized and nonrandomized studies. Journal of Educational Psychology, Vol
66(5), Oct 1974, 688-701
Angrist, J. D. & Pischke J.
(2008). Mostly harmless econometrics: An
empiricist's companion. Princeton University Press.
Imbens, G. W. & Wooldridge, J.M.(2009). Recent developments in the
econometrics of program
evaluation. Journal of Economic
Literature, 47:1, 5–86
Klaiber, H.A. & Smith,V.K. (2009). Evaluating Rubin's causal model for measuring the capitalization of environmental amenities. NBER Working Paper No 14957. National Bureau
of Economic Research.
No comments:
Post a Comment