Wednesday, March 9, 2016

What's the difference between difference-in-difference models in a linear vs nonlinear context?

A while back I discussed a powerful methodology for identification of causal effects from both a selection on observables and unobservables context, namely combining propensity score matching and difference-in-differences.

But recently I ran across a tweet from Felix Bethke (https://twitter.com/F_Bethke) sharing a blog post by Tom Pepinsky related to plug and play models. At the risk of oversimplifying, the take away was that we can't just take a methodology like DID used in a standard linear regression context and necessarily 'plug it into'  a non-linear context and get the same results. (often we see arguments going the other way around, we can't use linear models in a non-linear context but that is a different battle for another day).  I highly recommend Tom's post for more details and he links to a number of papers that clarify the issues in a very technical sense.

In a linear difference-in-difference (DID) analysis, identification of causal effects hinge on a common trend assumption and interpretation of the estimated regression coefficient on the time x treatment interaction term.

y = b0 + b1 x + b2 t+b3 x*t + e

In Tom's post, and some of the papers, specific attention is given to how the interpretation of the interaction term (and our estimated treatment effect or b3 in a specification like above) changes in a logit or probit context and its something quite different from the causal effect of interest.

I was specifically interested in knowing, is this an issue just for probit and logit models or other nonlinear models, like GLM models in general. For instance, in the healthcare economics literature, its very common to use probit or logit models in a two part modeling context where the second part of a two part model is a GLM model with a log link and gamma distribution. And I have seen some papers using a difference-in-differences across the board with these models.

I took a look at a couple of papers and it appears that these issues are a concern for any GLM model.

In a Health Services Research paper, Karaca-Mandic et al discuss these issues and in the abstract imply that this would apply to log transformed models often used in healthcare economics:

"We discuss the motivation for including interaction terms in multivariate analyses. We then explain how the straightforward interpretation of interaction terms in linear models changes in nonlinear models, using graphs and equations. We extend the basic results from logit and probit to difference‐in‐differences models, models with higher powers of explanatory variables, other nonlinear models (including log transformation and ordered models), and panel data models."

After pointing out several issues, they state:

"It is important to understand that the issues about interaction terms discussed here apply to all nonlinear models, including log transformation models"

More specifically, what are these issues, at least at a high level? Recall, difference-in-difference models are a special case of fixed effects panel data models, where unobserved differences and individual specific effects essentially cancel out providing clean identification of causal effects.  For this to work in the DID framework, a common trends assumption is required.  In the referenced paper below, Lechner points out (quite rigorously in the context of the potential outcomes framework):

"We start with a “natural” nonlinear model with a linear index structure which is transformed by a link function, G(·), to yield the conditional expectation of the potential outcome.....The common trend assumption relies on differencing out specific terms of the unobservable potential outcome, which does not happen in this nonlinear specification... Whereas the linear specification requires the group specific differences to be time constant, the nonlinear specification requires them to be absent. Of course, this property of this nonlinear specification removes the attractive feature that DiD allows for some selection on unobservable group and individual specific differences. Thus, we conclude that estimating a DiD model with the standard specification of a nonlinear model would usually lead to an inconsistent estimator if the standard common trend assumption is upheld. In other words, if the standard DiD assumptions hold, this nonlinear model does not exploit them (it will usually violate them). Therefore, estimation based on this model does not identify the causal effect "

Because they demonstrate that this applies to any GLM specification/link function, this seems to strike a blow to using DID in the context of a lot of the modeling approaches used in healthcare economics or any other field relying on similar GLM specifications.

So as Angrist and Pischke might ask, what is an applied guy to do? One approach even in the context of skewed distributions with high mass points (as is common in the healthcare econometrics space) is to specify a linear model. For dichotomous outcomes (utilization like ER visits or hospital admissions are often dichotomized and modeled by logit or probit models) you can just use a linear probability model. For skewed distributions with heavy mass points, dichotomization with a LPM may also be an attractive alternative.

References:

Special thanks to tweets and additional input from Tom Pepinsky and Marc Bellemare.

Interaction Terms in Nonlinear Models
Pinar Karaca-Mandic, Edward C. Norton, and Bryan Dowd
HSR: Health Services Research 47:1, Part I (February 2012)

The Estimation of Causal Effects by Difference-in-Difference Methods
By Michael Lechner Foundations and Trends in Econometrics
Vol. 4, No. 3 (2010) 165–224