A while back I was presenting some research I did that involved propensity score matching, and I was asked why I did not utilize a Heckman model. My response was that I viewed my selection issues from the context of the Rubin causal modeling and a selection on observables framework. And truthfully, I was not that familiar with Heckman. It is interesting that in Angrist and Pischke's Mostly Harmless Econometrics, Heckman is given scant attention. However here are some of the basics:
Some statistical pre-requisites:
Incidental
Truncation-we do not observe y due to the effect of another variable
z. This results in a truncated distribution of y:
f(y|z
> a) = f(y,z)/Prob(z > a) (1)
This
is a ratio of a density to a cumulative density function, referred to
as the inverse Mill’s ratio or selection hazard.
Of major interest is the expected value of a truncated normal variable:
Of major interest is the expected value of a truncated normal variable:
E(y|z
> a) = µ
+ ρσλ (2)
Application: The Heckman Model is often used in the context of truncated or incidental truncation or selection, where we only observe some outcome conditioned on a decision to participate or self select into a program or treatment. A popular example is the observation of wages only for people that choose to work, or outcomes for people that choose to participate in a job training or coaching program.
Estimation:
Estimation is a two step process.
Step
1: Selection Equation
Z
= wγ + µ (3)
for
each observation compute: λ_hat
= φ(w,γ)/Φ(w,γ ) (4) from estimates in the selection equation
Step
2: Outcome Equation
y|z
> 0 = xβ + βλ
λ_hat
+ v (5) where βλ
=ρσ (note the similarity to (2)
One
way to think of λ is the correlation between the treatment variable
and the error term in the context of omitted variable bias andendogeneity.
Y=
a + xc + bs + e (6)
where
s = selection or treatment indicator
if
the conditional independence or selection on observables assumption
does not hold, i.e. there are factors related to selection not
controlled for by x, then we have omitted variable bias and
correlatin between 's' and the error term 'e'. This results in
endogeneity and biased estimates of treatment effects 'b'.
if
we characterize correlation between e and s as λ = E(e | s,x) (7)
the
Heckman model consists of deriving an estimate of λ and including
it in a regression as previously illustrated.
Y= a + xc + bs + hλ + e (8)
As stated (paraphrasing somewhat) in Briggs (2004):
"The
Heckman model goes from specifying a selection model to getting an
estimate for the bias term E(e | s,x) by estimating the expected value of a truncated normal random
variable. This estimate is known in the literature as the Mills ratio
or hazard function, and can be expressed as the ration of the
standard normal density function to the cumulative distribution."
The Heckman model is powerful because it handles selection bias from both a selection on observables and unobservables context. There are however a number of assumptions involved that could limit its use. For more details I recommend the article by Briggs in the references below.
References:
Journal of Educational and Behavioral Statistics
Winter 2004, Vol. 29, No. 4, pp. 397-420
Causal Inference and the Heckman Model
Derek C. Briggs
Winter 2004, Vol. 29, No. 4, pp. 397-420
Causal Inference and the Heckman Model
Derek C. Briggs
Selection Bias - What You Don't. Know Can Hurt Your Bottom Line. Gaétan Veilleux, Valen . Casualty Actuarial Society -presentation.