Linear regression is a powerful empirical tool for the social sciences. Its robustness is often underrated, while at other times its use and interpretation is mischaracterized. Andrew Gelman and Agrist and Pischke are two great sources for learning about regression in an applied context.
I particularly like Gelman's comment here:
"It's all about
comparisons, nothing about how a variable "responds to change." Why?
Because, in its most basic form, regression tells you nothing at all about
change. It's a structured way of computing average comparisons in data."
This 'computing average comparisons of data' interpretation is why regression works as sort of a matching estimator as Angrist and Pischke argue.
This 'computing average comparisons of data' interpretation is why regression works as sort of a matching estimator as Angrist and Pischke argue.
“Our view is that
regression can be motivated as a particular sort of weighted matching
estimator, and therefore the differences between regression and matching
estimates are unlikely to be of major
empirical importance” (Chapter 3 p. 70)
empirical importance” (Chapter 3 p. 70)
In
further discussion Gelman goes on to say, I think in a very appropriate
interpretation:
“They're
saying (Angrist and Pischke ) that regression, like matching, is a way of
comparing-like-with-like in estimating a comparison. This point seems
commonplace from a statistical standpoint but may be news to some economists
who might think that regression relies on the linear model being true.”
This brings up a very important point, one
also in-line with Angrist and Pischke regarding the use of regression as an
empirical tool in the social sciences:
"In fact, the validity of linear regression as an
empirical tool does not turn on linearity either...The statement that
regression approximates the CEF lines up with our view of empirical work as an
effort to describe the essential features of statistical relationships, without
necessarily trying to pin them down exactly." - Mostly Harmless Econometrics,
p. 26 & 29
Regression users can seem at odds with each other at times. On one extreme they can get caught up in making very
clinical assumptions about linearity
(see somewhat related discussions related to linear probability models here and here) then
on the other hand, take robustness to extremes by failing to consider at times
questions of unobserved heterogeneity, endogeneity, selection bias, and
identification.
Cellini(2008) discusses this issue
in an analysis of the impact of financial aid on college enrollment:
“While simple ordinary least squares estimates of the impact of
aid on college-going can reveal a correlation between financial aid policies
and enrollment, these estimates are likely to suffer from omitted variable bias
due to self-selection, potentially overestimating or underestimating the causal
impact of these policies on enrollment.”
“The discussion above has outlined several methods for
addressing the problem of omitted variable bias in financial aid research…
proxy variable, fixed effects, and difference in- differences approaches are
becoming quite common. Indeed, these approaches have replaced basic
multivariate regression as the new standard for education research in the
economics literature”
This
is where quasi-experimental methods come in to play.
References
Stephanie Riegg Cellini. Causal Inference and Omitted
Variable Bias in Financial Aid Research: Assessing Solutions The Review of Higher Education Spring
2008, Volume 31, No. 3, pp. 329–354
No comments:
Post a Comment