Let's suppose we estimate the following:
Y =β0 + β1 X1+ e (1)
When we estimate a regression such as (1) above and leave
out an important variable such as X2 then our estimate of β1
can become unbiased and inconsistent. In fact, to the extent that X1
and X2 are both correlated, X1 becomes correlated with
the error term violating a basic assumption of regression. The omitted
information in X2 is referred to in econometrics as ‘unobserved
heterogeneity.’ Heterogeneity is simply variation across individual units
of observations, and since we can’t observe this variation or heterogeneity as
it relates to X2, we have unobserved
heterogeneity. Correlation between an explanatory variable and the
error term is referred to as endogeneity.
So in econometrics, when we have an omitted variable (as is often with cases of
causal inference and selection
bias) we say we have endogeneity caused
by unobserved
heterogeneity.
How do we characterize the impacts of this on our
estimate of β1 ?
We know from basic econometrics that our estimate of β =
b = (X’X)-1X’Y or COV(Y,X)/VAR(X)
(2)
If we substitute Y = β0 + β1 X1+
e into (2) we get:
COV(β0 + β1 X+ e,X)/VAR(X) =
COV(β0,X)/VAR(X) + COV(β1
X,X)/VAR(X) + COV(e,X)/VAR(X)
(3)
= 0 + β1 VAR(X)/VAR(X) + COV(e,X)/VAR(X)
(4)
= β1 + COV(e,X)/VAR(X)
(5)
We can see from (5) that if we leave out a variable
in (1) i.e. we have unobserved heterogeneity, then the correlation that results
between X and the error term will not be zero, and our estimate for β1
will be biased by the term COV(e,X)/VAR(X). If (1) were correctly
specified, then the term COV(e,X)/VAR(X) will drop out and we will get an
unbiased estimate of β1
Thank you for this. Have you also discussed difference in differences estimation especially the variety that takes on multiple time periods for a recurring treatment (as opposed to the common which involves only two periods)?
ReplyDeletegood stuff
ReplyDeletehowever, there is a typo in the first sentence: "When we estimate a regression such as (1) above and leave out an important variable such as X2 then our estimate of β1 can become unbiased and inconsistent."
should "our estimate of β1 can become BIASED and inconsistent
YES! THANK YOU! It should say BIASED. I need to correct that.
Deletethank you for the reply. i really like your post; it helps clarify the difficult jargon in an intuitive way.
ReplyDeletei have a quick, related question. is the phrase, "correlated unobervables" referring to the same phenomenon as "unobserved heterogeneity"?
thanks again