Tuesday, March 22, 2016

Identification and Common Trend Assumptions in Difference-in-Differences for Linear vs GLM Models


In a previous post I discussed the conclusion from Lechner’s paper 'The Estimation of Causal Effects by Difference-in-Difference Methods', that difference-in-difference models in a non-linear or GLM context failed to meet the common trend assumptions, and therefore failed to identify treatment effects from a selection on unobservables context.

In that post I noted that Lechner points out (quite rigorously in the context of the potential outcomes framework):

"We start with a “natural” nonlinear model with a linear index structure which is transformed by a link function, G(·), to yield the conditional expectation of the potential outcome.....The common trend assumption relies on differencing out specific terms of the unobservable potential outcome, which does not happen in this nonlinear specification... Whereas the linear specification requires the group specific differences to be time constant, the nonlinear specification requires them to be absent. Of course, this property of this nonlinear specification removes the attractive feature that DiD allows for some selection on unobservable group and individual specific differences. Thus, we conclude that estimating a DiD model with the standard specification of a nonlinear model would usually lead to an inconsistent estimator if the standard common trend assumption is upheld. In other words, if the standard DiD assumptions hold, this nonlinear model does not exploit them (it will usually violate them). Therefore, estimation based on this model does not identify the causal effect”

I wanted to review at a high level exactly how he gets to this result. But I wanted to simplify this as much as possible and start with some basic concepts. Starting with a basic regression model, the population conditional expectation function, or conditional mean of Y given X can be written as:

Regression and Expected Value Notation:

E[Y|X] = β0 + β1 X (1)

and we estimate this with the regression on observed data:

y = b0 + b1X + e (2)

Where b1 is our estimate of the population parameter of interest β1.

If E[b1] = β1 then we say our estimator is unbiased.

Potential Outcomes Notation:

When it comes to experimental designs, we are interested in knowing counterfactuals, that is what value of an outcome would a treatment or program participant have in absence of treatment (the baseline potential outcome) vs. if they participated or were treated? If we specify these 'potential outcomes' as follows:

Y0= baseline potential outcome
Y1= potential treatment outcome

We can characterize the treatment effect as:

E[Y0-Y1] or the difference in potential treated vs baseline outcomes. This is referred to as the average treatment effect or ATE. Sometimes we are interested in, or some models estimate, the average treatment effect on the treated or ATT :E[Y0-Y1 | d = 1]  

where d is an indicator for treatment (d = 1) vs control or untreated (d =0).

Difference-in-Difference Analysis:

Difference-in-difference (DD) estimators assume that in absence of treatment the difference between control (B) and treatment (A) groups would be constant or ‘fixed’ over time. Treatment effects in DD estimators are derived by subtracting differences between pre and post values within treatment and control groups, and then taking a difference in differences between treatment and control groups. The unobservable effects that are constant or fixed over time 'difference out' allowing us to identify treatment effects controlling for these unobservable characteristics with out explicitly measuring them. This characterizes what is referred to as a 'selection on unobservables' framework.


  This can also be estimated using linear regression with an interaction term:

y = b0 + b1 d + b2 t + b3 d*t+ e (3)

where d indicates treatment (d=1 vs d = 0) and the estimated coefficient (b3 ) on the time by treatment interaction term gives us our estimate of treatment effects. 


Lechner and Potential Outcomes Framework:

In an attempt to present the issues with GLM DD models depicted in Lechner (2010) using the simplest notation possible (abusing notation slightly and perhaps at a cost of precision), we can depict the framework for difference-in-difference analysis using expectations:

DID = [E(Y1|D=1)-E( Y0|D=1)] -[E(Y1|D=0)-E(Y0|D=0)] (4)


DID = [pre/post differences for treatment group] – [pre/post differences for control group]

where Y represents the observed outcome values sub-scripted by pre (0) and post periods(1)

We can represent potential outcomes in the regression framework as follows:

E(Yt1|D) = α + tδ1 + dγ “potential outcome if treated” (5)

E(Yt0|D) = α + tδ0 + dγ “potential baseline outcome” (6)

ATET: E(Yt1- Yt0|D= 1) = θ1 = δ    (7)

difference-in-difference of potential outcomes across time if treated”

We can estimate δ with a regression on observed data of the form:

y = b0 + b1 d + b2 t + b3 d*t+ e (3')

where b3 is our estimator for δ.

Common Trend Assumption:
Difference-in-difference (DD) estimators assume that in absence of treatment the difference between control (B) and treatment (A) groups would be constant or ‘fixed’ over time. This can be represented geometrically in a linear modeling context by 'parallel trends' in outcome levels between treatment and control groups in absence of a treatment:


As depicted above, BB represents the trend in outcome Y for a control group. AA represents the counterfactual trend, or parallel or common trend for the treatment group that would occur in absence of treatment. The distance A'A represents a departure from the parallel trend in response to treatment, and would be our DD treatment effect or the value b3 our estimator for δ.

The common trend assumption following Lechner, can be expressed in terms of potential outcomes:

E(Y10|D=1)-E(Y00|D=1) = α + δ0 + γ - α – γ = δ0 (8)

E(Y10|D=0)-E(Y00|D=0) = α + δ0 - α = δ0 (9)

i.e. the pre and post period differences in baseline outcomes is the same (δ0) regardless if individuals are assigned to the treatment group (D=1) or control group (D=0).

Nonlinear Models:

In a GLM framework, with a specific link function G(.) a DD framework can be expressed in terms of potential outcomes as follows:

E(Yt1|D) = G(α + tδ1 + dγ) “potential outcome if treated” (10)

E(Yt0|D) = G(α + tδ0 + dγ) “potential baseline outcome” (11)

DID can be estimated by regression on observed outcomes:

G(b0 + b1 d + b2 t + b3 d*t) (12)

Common Trend Assumption:

E(Y10|D=1)-E(Y00|D=1) = G(α + δ0 + γ) - G(α + γ) (13)

E(Y10|D=0)-E(Y00|D=0) = G(α + δ0 ) - G(α ) (14)

It turns out in a GLM framework, for the common trend assumption to hold, group specific differences must be zero or γ =0. The common trend assumption relies on differencing out specific terms of the unobservable potential outcome, or the individual specific effects we are trying to control for in the selection on unobservables scenario, but in a GLM scenario we have to assume that these effects are zero or absent. In essence, the attractive feature of DD models to control for unobservable effects is not a feature of DD models in a GLM scenario.  
References: 
The Estimation of Causal Effects by Difference-in-Difference Methods
By Michael Lechner Foundations and Trends in Econometrics
Vol. 4, No. 3 (2010) 165–224  


Program Evaluation and the
Difference-in-Difference Estimator
Course Notes
Education Policy and Program Evaluation
Vanderbilt University
October 4, 2008

Difference in Difference Models, Course Notes
ECON 47950: Methods for Inferring Causal Relationships in Economics
William N. Evans
University of Notre Dame
Spring 2008
 

No comments:

Post a Comment