## Saturday, January 15, 2011

### Instrumental Variables (IVs)

In a previous post I discussed how IVs and 2SLS can be used to correct for simultaneity bias in systems of simultaneous equations. 2SLS and IVs may also be useful in other contexts, such as the case of measurement error. Greene gives the following example:

Suppose you are estimating a consumption function with income as an explanatory variable:

C = f(I) + e

In general, it may be that income is not accurately reported. As a result, the estimation of C will suffer from measurement error. Greene, in the example given in his text, proposes that instead of using income as an explanatory variable, if we had data on the number of checks written per household, this variable would likely be correlated with income, but uncorrelated with measurement error. The number of checks written would be an instrumental variable.

IV Estimation via 2SLS:

In the context of the example above, let C be our dependent variable and x be income, then our regression equation becomes:

y = b x + e

An instrumental variable  ‘z’ is one that is correlated with ‘x’ but not ‘e’

E(y|z) = bE(x|z) + E(e|z), and by assumption E(e|z) =0

Stage 1: Regress x on z to get x*

Stage 2: Regress y on x*

This gives us bIV = (z’x)-1z’y and the result  y = bIV x* +e
An important concept in IVs is the exclusion principle which states that the only way z impacts y is through z’s effect on x. In other words, the causal model is:

z→ x→ y

Related to this, I believe, is a statement you will find in 'Mostly Harmless Econometrics' by Angrist and Pischke (where s is the independent variable being instrumented):

"Intuitively, conditional on covariates, 2SLS retains only the variation in s that is generated by quasi-experimental variation- that is generated by the instrument z"

References:

Angrist and Pischke, Mostly Harmless Econometrics, 2009

Greene, Econometric Analysis. 5th Edition

See also this great blog post from Dr. Andrew Gelman with comments from Hal Varian: How to think about instrumental variables when you get confused

“Suppose z is your instrument, T is your treatment, and y is your outcome. So the causal model is z -> T -> y……. when I get stuck, I find it extremely helpful to go back and see what I've learned from separately thinking about the correlation of z with T, and the correlation of z with y. Since that's ultimately what instrumental variables analysis is doing.”

"You have to assume that the only way that z affects Y is through the treatment, T. So the IV model is
T = az + e
y = bT + d

It follows that
E(y|z) = b E(T|z) + E(d|z)
Now if we
1) assume E(d|z) = 0
2) verify that E(T|z) != 0
we can solve for b by division. Of course, assumption 1 is untestable.
An extreme case is a purely randomized experiment, where e=0 and z is a coin flip."