Let's suppose we estimate the following:

Y =β

_{0}+ β_{1}X_{1}+ e (1)
When we estimate a regression such as (1) above and leave
out an important variable such as X

_{2}then our estimate of β_{1}can become unbiased and inconsistent. In fact, to the extent that X_{1}and X_{2}are both correlated, X_{1}becomes correlated with the error term violating a basic assumption of regression. The omitted information in X_{2}is referred to in econometrics as*‘unobserved heterogeneity.’*Heterogeneity is simply variation across individual units of observations, and since we can’t observe this variation or heterogeneity as it relates to X_{2}, we have*unobserved heterogeneity*. Correlation between an explanatory variable and the error term is referred to as*endogeneity*. So in econometrics, when we have an omitted variable (as is often with cases of causal inference and selection bias) we say we have*endogeneity*caused by*unobserved heterogeneity*.
How do we characterize the impacts of this on our
estimate of β

_{1}?
We know from basic econometrics that our estimate of β =

b = (X’X)

^{-1}X’Y or COV(Y,X)/VAR(X) (2)
If we substitute Y = β

_{0}+ β_{1}X_{1}+ e into (2) we get:
COV(β

_{0}+ β_{1}X+ e,X)/VAR(X) =
COV(β

_{0},X)/VAR(X) + COV(β_{1}X,X)/VAR(X) + COV(e,X)/VAR(X) (3)
= 0 + β

_{1}VAR(X)/VAR(X) + COV(e,X)/VAR(X) (4)
= β

_{1}+ COV(e,X)/VAR(X) (5)
We can see from (5) that if we leave out a variable
in (1) i.e. we have unobserved heterogeneity, then the correlation that results
between X and the error term will not be zero, and our estimate for β

_{1}will be biased by the term COV(e,X)/VAR(X). If (1) were correctly specified, then the term COV(e,X)/VAR(X) will drop out and we will get an unbiased estimate of β_{1}
Thank you for this. Have you also discussed difference in differences estimation especially the variety that takes on multiple time periods for a recurring treatment (as opposed to the common which involves only two periods)?

ReplyDeletegood stuff

ReplyDeletehowever, there is a typo in the first sentence: "When we estimate a regression such as (1) above and leave out an important variable such as X2 then our estimate of β1 can become unbiased and inconsistent."

should "our estimate of β1 can become BIASED and inconsistent

YES! THANK YOU! It should say BIASED. I need to correct that.

Deletethank you for the reply. i really like your post; it helps clarify the difficult jargon in an intuitive way.

ReplyDeletei have a quick, related question. is the phrase, "correlated unobervables" referring to the same phenomenon as "unobserved heterogeneity"?

thanks again