In
a previous post I discussed the conclusion from Lechner’s paper 'The
Estimation of Causal Effects by Difference-in-Difference Methods',
that difference-in-difference models in a non-linear or GLM context
failed to meet the common trend assumptions, and therefore failed to
identify treatment effects from a selection on unobservables context.
In
that paper I noted that Lechner points out (quite rigorously in the
context of the potential outcomes framework):

*"We
start with a “natural” nonlinear model with a linear index
structure which is transformed by a link function, G(·), to yield
the conditional expectation of the potential outcome.....The common
trend assumption relies on differencing out specific terms of the
unobservable potential outcome, which does not happen in this
nonlinear specification... Whereas the linear specification requires
the group specific differences to be time constant, the nonlinear
specification requires them to be absent. Of course, this property of
this nonlinear specification removes the attractive feature that DiD
allows for some selection on unobservable group and individual
specific differences. Thus, we conclude that estimating a DiD model
with the standard specification of a nonlinear model would usually
lead to an inconsistent estimator if the standard common trend
assumption is upheld. In other words, if the standard DiD assumptions
hold, this nonlinear model does not exploit them (it will usually
violate them). Therefore, estimation based on this model does not
identify the causal effect”*

I
wanted to review at a high level exactly how he gets to this result.
But I wanted to simplify this as much as possible and start with some
basic concepts. Starting with a basic regression model, the
population conditional expectation function, or conditional mean of Y
given X can be written as:

**Regression
and Expected Value Notation:**

E[Y|X]
= β_{0 }+ β_{1} X (1)

and
we estimate this with the regression on observed data:

y
= b_{0} + b_{1}X + e (2)

Where
b_{1} is our estimate of the population parameter of interest
β_{1}.

If
E[b_{1}] = β_{1} then we say our estimator is
unbiased.

**Potential
Outcomes Notation:**

When
it comes to experimental designs, we are interested in knowing
counterfactuals, that is what value of an outcome would a treatment
or program participant have in absence of treatment (the baseline
potential outcome) vs. if they participated or were treated? If we
specify these 'potential outcomes' as follows:
Y^{0}=
baseline potential outcome

Y^{1}= potential treatment outcome

We
can characterize the treatment effect as:

E[Y^{0}-Y^{1}]
or the difference in potential treated vs baseline outcomes. This
is referred to as the average treatment effect or ATE. Sometimes we
are interested in, or some models estimate, the average treatment
effect on the treated or ATT :E[Y^{0}-Y^{1
}|
d = 1]

where
d is an indicator for treatment (d = 1) vs control or untreated (d
=0).

**Difference-in-Difference
Analysis:**

Difference-in-difference (DD) estimators assume that in absence of treatment the difference
between control (B) and treatment (A) groups would be constant or
‘fixed’ over time. Treatment effects in DD estimators are derived
by subtracting differences between pre and post values within
treatment and control groups, and then taking a difference in
differences between treatment and control groups. The unobservable
effects that are constant or fixed over time 'difference out'
allowing us to identify treatment effects controlling for these
unobservable characteristics with out explicitly measuring them. This characterizes what is referred to as a 'selection on
unobservables' framework.

This
can also be estimated using linear regression with an interaction
term:

y
= b_{0} + b_{1} d_{ }+ b_{2} t + b_{3
}d*t+ e (3)

where
d indicates treatment (d=1 vs d = 0) and the estimated coefficient (b_{3 }) on the time by treatment interaction term gives us our estimate
of treatment effects.

**Lechner
and Potential Outcomes Framework:**

In
an attempt to present the issues with GLM DD models depicted in Lechner (2010) using
the simplest notation possible (abusing notation slightly and perhaps
at a cost of precision), we can depict the framework for
difference-in-difference analysis using expectations:

DID
= [E(Y_{1}|D=1)-E(
Y_{0}|D=1)]
-[E(Y_{1}|D=0)-E(Y_{0}|D=0)]
(4)

DID
= [pre/post differences for treatment group] – [pre/post
differences for control group]

where
Y represents the observed outcome values sub-scripted by pre (0) and
post periods(1)

We
can represent potential outcomes in the regression framework as
follows:

E(Y_{t}^{1}|D)
= α + tδ^{1} + dγ “potential outcome if treated” (5)

E(Y_{t}^{0}|D)
= α + tδ^{0} + dγ “potential baseline outcome” (6)

ATET:
E(Y_{t}^{1}-
Y_{t}^{0}|D=
1) = θ_{1}
= δ (7)

“difference-in-difference
of potential outcomes across time if treated”

We
can estimate δ with a regression on observed data of the form:

y
= b_{0}
+ b_{1}
d_{
}+
b_{2}
t + b_{3
}d*t+
e (3')

where
b_{3
}is
our estimator for δ.

**Common
Trend Assumption:**

Difference-in-difference
(DD) estimators assume that in absence of treatment the difference
between control (B) and treatment (A) groups would be constant or
‘fixed’ over time. This can be represented geometrically in a
linear modeling context by 'parallel trends' in outcome levels
between treatment and control groups in absence of a treatment:

As
depicted above, BB represents the trend in outcome Y for a control
group. AA represents the counterfactual trend, or parallel or common
trend for the treatment group that would occur in absence of
treatment. The distance A'A represents a departure from the parallel
trend in response to treatment, and would be our DD treatment effect
or the value b_{3
}our
estimator for δ.

The
common trend assumption following Lechner, can be expressed in terms
of potential outcomes:

E(Y_{1}^{0}|D=1)-E(Y_{0}^{0}|D=1)
= α + δ^{0} + γ - α – γ = δ^{0 }(8)

E(Y_{1}^{0}|D=0)-E(Y_{0}^{0}|D=0)
= α + δ^{0}
- α = δ^{0
}(9)

i.e.
the pre and post period differences in baseline outcomes is the same
(δ^{0}) regardless if individuals are assigned to the treatment group (D=1)
or control group (D=0).

**Nonlinear
Models:**

In
a GLM framework, with a specific link function G(.) a DD framework
can be expressed in terms of potential outcomes as follows:

E(Y_{t}^{1}|D)
= G(α + tδ^{1} + dγ) “potential outcome if treated”
(10)

E(Y_{t}^{0}|D)
= G(α + tδ^{0} + dγ) “potential baseline outcome”
(11)

DID
can be estimated by regression on observed outcomes:

G(b_{0}
+ b_{1} d_{ }+ b_{2} t + b_{3 }d*t)
(12)

**Common
Trend Assumption:**

E(Y_{1}^{0}|D=1)-E(Y_{0}^{0}|D=1)
= G(α + δ^{0} + γ) - G(α + γ) ^{ }(13)

E(Y_{1}^{0}|D=0)-E(Y_{0}^{0}|D=0)
= G(α + δ^{0} ) - G(α ) ^{ }(14)

It
turns out in a GLM framework, for the common trend assumption to
hold, group specific differences must be zero or γ =0. The common
trend assumption relies on differencing out specific terms of the
unobservable potential outcome, or the individual specific effects we
are trying to control for in the selection on unobservables scenario,
but in a GLM scenario we have to assume that these effects are zero
or absent. In essence, the attractive feature of DD models to control for unobservable effects is not a feature of DD models in a GLM scenario. ** **

**References:**

The Estimation of Causal Effects by Difference-in-Difference Methods

By Michael Lechner Foundations and Trends in Econometrics

Vol. 4, No. 3 (2010) 165–224

Program Evaluation and the

Difference-in-Difference Estimator

Course Notes

Education Policy and Program Evaluation

Vanderbilt University

October 4, 2008

Difference in Difference Models, Course Notes

ECON 47950: Methods for Inferring Causal Relationships in Economics

William N. Evans

University of Notre Dame

Spring 2008