Econometric Sense: Unobserved Heterogeneity and Endogeneity

Tuesday, June 18, 2013

Unobserved Heterogeneity and Endogeneity

Let's suppose we estimate the following:

Y =β₀ + β₁ X₁+ e (1)

When we estimate a regression such as (1) above and leave out an important variable such as X₂ then our estimate of β₁ can become unbiased and inconsistent. In fact, to the extent that X₁ and X₂ are both correlated, X₁ becomes correlated with the error term violating a basic assumption of regression. The omitted information in X₂ is referred to in econometrics as ‘unobserved heterogeneity.’ Heterogeneity is simply variation across individual units of observations, and since we can’t observe this variation or heterogeneity as it relates to X₂, we have unobserved heterogeneity. Correlation between an explanatory variable and the error term is referred to as endogeneity. So in econometrics, when we have an omitted variable (as is often with cases of causal inference and selection bias) we say we have endogeneity caused by unobserved heterogeneity.

How do we characterize the impacts of this on our estimate of β₁ ?

We know from basic econometrics that our estimate of β =

b = (X’X)^-1X’Y or COV(Y,X)/VAR(X) (2)

If we substitute Y = β₀ + β₁ X₁+ e into (2) we get:

COV(β₀ + β₁ X+ e,X)/VAR(X) =

COV(β₀,X)/VAR(X) + COV(β₁ X,X)/VAR(X) + COV(e,X)/VAR(X) (3)

= 0 + β₁ VAR(X)/VAR(X) + COV(e,X)/VAR(X) (4)

= β₁ + COV(e,X)/VAR(X) (5)

We can see from (5) that if we leave out a variable in (1) i.e. we have unobserved heterogeneity, then the correlation that results between X and the error term will not be zero, and our estimate for β₁ will be biased by the term COV(e,X)/VAR(X). If (1) were correctly specified, then the term COV(e,X)/VAR(X) will drop out and we will get an unbiased estimate of β₁

4 comments:

UnknownJuly 28, 2014 at 11:36 AM
Thank you for this. Have you also discussed difference in differences estimation especially the variety that takes on multiple time periods for a recurring treatment (as opposed to the common which involves only two periods)?
ReplyDelete
Replies
AnonymousAugust 1, 2014 at 3:07 AM
good stuff

however, there is a typo in the first sentence: "When we estimate a regression such as (1) above and leave out an important variable such as X2 then our estimate of β1 can become unbiased and inconsistent."

should "our estimate of β1 can become BIASED and inconsistent
ReplyDelete
Replies
AnonymousAugust 5, 2014 at 6:16 AM
thank you for the reply. i really like your post; it helps clarify the difficult jargon in an intuitive way.

i have a quick, related question. is the phrase, "correlated unobervables" referring to the same phenomenon as "unobserved heterogeneity"?

thanks again
ReplyDelete
Replies

Add comment