Friday, July 29, 2016

Heckman...what the heck?


A while back I was presenting some research I did that involved propensity score matching, and I was asked why I did not utilize a Heckman model. My response was that I viewed my selection issues from the context of the Rubin causal modeling and a selection on observables framework. And truthfully, I was not that familiar with Heckman. It is interesting that in Angrist and Pischke's Mostly Harmless Econometrics, Heckman is given scant attention. However here are some of the basics:

Some statistical pre-requisites:
Incidental Truncation-we do not observe y due to the effect of another variable z. This results in a truncated distribution of y:



f(y|z > a) = f(y,z)/Prob(z > a)  (1)

This is a ratio of a density to a cumulative density function, referred to as the inverse Mill’s ratio or selection hazard.

Of major interest is the expected value of a truncated normal variable:

E(y|z > a) = µ + ρσλ  (2)


Application: The Heckman Model is often used in the context of truncated or incidental truncation or selection, where we only observe some outcome conditioned on a decision to participate or self select into a program or treatment. A popular example is the observation of wages only for people that choose to work, or outcomes for people that choose to participate in a job training or coaching program.



Estimation: Estimation is a two step process.


Step 1: Selection Equation

Z = wγ + µ  (3)

for each observation compute: λ_hat = φ(w,γ)/Φ(w,γ ) (4) from estimates in the selection equation

Step 2: Outcome Equation

y|z > 0 = xβ + βλ λ_hat + v  (5) where βλ =ρσ (note the similarity to (2)

One way to think of λ is the correlation between the treatment variable and the error term in the context of omitted variable bias andendogeneity

Y= a + xc + bs  + e  (6)

where s = selection or treatment indicator

if the conditional independence or selection on observables assumption does not hold, i.e. there are factors related to selection not controlled for by x, then we have omitted variable bias and correlatin between 's' and the error term 'e'. This results in endogeneity and biased estimates of treatment effects 'b'.

if we characterize correlation between e and s as λ = E(e | s,x)     (7)

the Heckman model consists of deriving an estimate of λ and including it in a regression as previously illustrated.
 

 
Y= a + xc + bs  + hλ + e  (8)


As stated (paraphrasing somewhat) in Briggs (2004):



"The Heckman model goes from specifying a selection model to getting an estimate for the bias term E(e | s,x) by estimating the expected value of a truncated normal random variable. This estimate is known in the literature as the Mills ratio or hazard function, and can be expressed as the ration of the standard normal density function to the cumulative distribution."

The Heckman model is powerful because it handles selection bias from both a selection on observables and unobservables context. There are however a number of assumptions involved that could limit its use. For more details I recommend the article by Briggs in the references below.

References:

 Journal of Educational and Behavioral Statistics
 Winter 2004, Vol. 29, No. 4, pp. 397-420
 Causal Inference and the Heckman Model
 Derek C. Briggs

 Selection Bias - What You Don't. Know Can Hurt Your Bottom Line. Gaétan Veilleux, Valen . Casualty Actuarial Society -presentation.