Tuesday, September 10, 2013

Regression as an Empirical Tool

Linear regression is a powerful empirical tool for the social sciences.  Its robustness is often underrated, while at other times its use and interpretation is mischaracterized.  Andrew Gelman and Agrist and Pischke are two great sources for learning about regression in an applied context.

I particularly like Gelman's comment here:
"It's all about comparisons, nothing about how a variable "responds to change." Why? Because, in its most basic form, regression tells you nothing at all about change. It's a structured way of computing average comparisons in data."

This 'computing average comparisons of data' interpretation is why regression works as sort of a matching estimator as Angrist and Pischke  argue. 

“Our view is that regression can be motivated as a particular sort of weighted matching estimator, and therefore the differences between regression and matching estimates are unlikely to be of major
empirical importance” (Chapter 3 p. 70)

 In further discussion Gelman goes on to say, I think in a very appropriate interpretation:

“They're saying (Angrist and Pischke ) that regression, like matching, is a way of comparing-like-with-like in estimating a comparison. This point seems commonplace from a statistical standpoint but may be news to some economists who might think that regression relies on the linear model being true.”

This brings up a very important point, one also in-line with Angrist and Pischke regarding the use of regression as an empirical tool in the social sciences:

"In fact, the validity of linear regression as an empirical tool does not turn on linearity either...The statement that regression approximates the CEF lines up with our view of empirical work as an effort to describe the essential features of statistical relationships, without necessarily trying to pin them down exactly."  - Mostly Harmless Econometrics, p. 26 & 29

Regression users can seem at odds with each other at times. On one extreme they can get caught up in making very clinical assumptions  about linearity (see somewhat related discussions related to linear probability models  here and here) then on the other hand, take robustness to extremes by failing to consider at times questions of unobserved heterogeneity, endogeneity, selection bias, and identification.

Cellini(2008) discusses this issue in an analysis of the impact of financial aid on college enrollment:

“While simple ordinary least squares estimates of the impact of aid on college-going can reveal a correlation between financial aid policies and enrollment, these estimates are likely to suffer from omitted variable bias due to self-selection, potentially overestimating or underestimating the causal impact of these policies on enrollment.”

“The discussion above has outlined several methods for addressing the problem of omitted variable bias in financial aid research… proxy variable, fixed effects, and difference in- differences approaches are becoming quite common. Indeed, these approaches have replaced basic multivariate regression as the new standard for education research in the economics literature”

 This is where quasi-experimental methods come in to play.


 Stephanie Riegg Cellini. Causal Inference and Omitted Variable Bias in Financial Aid Research: Assessing Solutions The Review of Higher Education Spring 2008, Volume 31, No. 3, pp. 329–354

No comments:

Post a Comment