Tuesday, September 10, 2013

Causal Inference and Quasi-experimental Design Roundup

I’ve dedicated several posts recently to the subject of quasi-experimental designs and causal inference. I’ve tried to organize the following related links for a bigger picture.

First off, regression is often mischaracterized by a clinical view of  assumptions related to linearity. However, as Angrist and Pischke state:

"In fact, the validity of linear regression as an empirical tool does not turn on linearity either...The statement that regression approximates the CEF lines up with our view of empirical work as an effort to describe the essential features of statistical relationships, without necessarily trying to pin them down exactly."  - Mostly Harmless Econometrics, p. 26 & 29

For more discussion see:

In Mostly Harmless Econometrics, not only is linear regression rigorously developed and discussed, but quasi-experimental designs are given very heavy emphasis. As discussed in Cellini (2008):

… proxy variable, fixed effects, and difference in- differences approaches are becoming quite common. Indeed, these approaches have replaced basic multivariate regression as the new standard for education research in the economics literature”

The links below attempt to highlight at least in a heuristic sense, these methods and the issues they attempt to address:

Time Series Methods:

Regression as an Empirical Tool

Linear regression is a powerful empirical tool for the social sciences.  Its robustness is often underrated, while at other times its use and interpretation is mischaracterized.  Andrew Gelman and Agrist and Pischke are two great sources for learning about regression in an applied context.

I particularly like Gelman's comment here:
"It's all about comparisons, nothing about how a variable "responds to change." Why? Because, in its most basic form, regression tells you nothing at all about change. It's a structured way of computing average comparisons in data."

This 'computing average comparisons of data' interpretation is why regression works as sort of a matching estimator as Angrist and Pischke  argue. 

“Our view is that regression can be motivated as a particular sort of weighted matching estimator, and therefore the differences between regression and matching estimates are unlikely to be of major
empirical importance” (Chapter 3 p. 70)

 In further discussion Gelman goes on to say, I think in a very appropriate interpretation:

“They're saying (Angrist and Pischke ) that regression, like matching, is a way of comparing-like-with-like in estimating a comparison. This point seems commonplace from a statistical standpoint but may be news to some economists who might think that regression relies on the linear model being true.”

This brings up a very important point, one also in-line with Angrist and Pischke regarding the use of regression as an empirical tool in the social sciences:

"In fact, the validity of linear regression as an empirical tool does not turn on linearity either...The statement that regression approximates the CEF lines up with our view of empirical work as an effort to describe the essential features of statistical relationships, without necessarily trying to pin them down exactly."  - Mostly Harmless Econometrics, p. 26 & 29

Regression users can seem at odds with each other at times. On one extreme they can get caught up in making very clinical assumptions  about linearity (see somewhat related discussions related to linear probability models  here and here) then on the other hand, take robustness to extremes by failing to consider at times questions of unobserved heterogeneity, endogeneity, selection bias, and identification.

Cellini(2008) discusses this issue in an analysis of the impact of financial aid on college enrollment:

“While simple ordinary least squares estimates of the impact of aid on college-going can reveal a correlation between financial aid policies and enrollment, these estimates are likely to suffer from omitted variable bias due to self-selection, potentially overestimating or underestimating the causal impact of these policies on enrollment.”

“The discussion above has outlined several methods for addressing the problem of omitted variable bias in financial aid research… proxy variable, fixed effects, and difference in- differences approaches are becoming quite common. Indeed, these approaches have replaced basic multivariate regression as the new standard for education research in the economics literature”

 This is where quasi-experimental methods come in to play.


 Stephanie Riegg Cellini. Causal Inference and Omitted Variable Bias in Financial Aid Research: Assessing Solutions The Review of Higher Education Spring 2008, Volume 31, No. 3, pp. 329–354

Friday, September 6, 2013

IPTW Regression

An alternative to direct matching or matching on propensity scores involves the use of the inverse of propensity scores in a weighted regression framework (Horvitz and Thompson (1952), known as inverse probability of treatment weighted (IPTW) regression where:

 IPTW regression (with weights specified as above) specifically estimate the average treatment effect (ATE) (Astin,2011):

ATE = E[Y1i-Y0i] 
Inverse probability of treatment weighting (IPTW) uses weights derived from the propensity scores  to create a pseudo population such that the distribution of covariates in the population are independent of treatment assignment.  (Astin,2011). This is an appeal to the CIA and Rosenbaum and Rubin’s propensity score theorem discussed before.  The weighting scheme essentially ‘weights up’ control units to look like treatment units (Stuart,2011). 


Austin, P.(2011). An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.  Multivariate Behav Research.May; 46(3): 399–424.

Horvitz D. G. & Thompson D. J.(1952) . A Generalization of Sampling Without Replacement From a Finite Universe. Journal of the American Statistical Association, Vol. 47, No. 260 (Dec., 1952), pp. 663- 685

Maciejewski, M. L. & Brookhart, M.A. (2011). Propensity score workshop . Retrieved January 19,2013. Website: http://ahrqplexnet.sharepointspace.com/Webinars/PS_webinar_followup.pdf

Stuart ,E.(2011).Propensity score methods for estimating causal effects: The why, when, and how . Johns Hopkins Bloomberg School of Public Health. Department of Mental Health. Department of      Biostatistics. Retrieved January 19,2013. Website:  www.biostat.jhsph.edu/estuart 

Thursday, September 5, 2013

Propensity Score Matching

See previous: http://econometricsense.blogspot.com/2013/05/selection-bias-and-rubin-causal-model.html
The problem of selection bias can be well characterized within the Rubin causal model or potential outcomes framework (Angrist and Pischke,2008; Rubin, 1974; Imbens and Wooldridge, 2009, Klaiber & Smith,2009). In a previous post I explained how selection bias can overpower the actual treatment effect and leave the naïve researcher to conclude that the intervention or treatment was ineffectual or lead them to under or overestimate the true treatment effects depending on the direction of the bias.

However, according to the conditional independence assumption (CIA) ( Rubin, 1973; Angrist & Pischke, 2008; Rosenbaum and Rubin, 1983;Angrist and Hahn,2004) conditional on covariate comparisons may remove selection bias, giving us the estimate of the treatment effect we need:

E[Yi|xi,di=1]- E[Yi|xi,di=0]= E[Y1i-Y0i|xi] or Y1i,Y0i di| xi    
The last term implies that treatment assignment ( di) and response (Y1i,Y0i are conditionally independent given covariates xi. This conclusion provides the justification and motivation for utilizing matched comparisons to estimate treatment  effects.  Matched comparisons imply balance on observed covariates, which ‘recreates’ a situation similar to a randomized experiment  where all subjects are essentially the same except for the treatment(Thoemmes and Kim,2011).  However, matching on covariates can be complicated and cumbersome. An alternative is to implement matching based on an estimate of the probability of receiving treatment or selection. This probability is referred to as a propensity score. Given estimates of the propensity or probability of receiving treatment, comparisons can then be made between observations matched on propensity scores.  This is in effect a two stage process requiring first a specification and estimation of a model used to derive the propensity scores, and then some implementation of matched comparisons made on the basis of the propensity scores.  Rosenbaum and Rubin’s propensity score theorem (1983) states that if the CIA holds, then matching or conditioning on propensity scores (denoted p(xi) ) will also eliminate selection bias, i.e. treatment assignment ( di) and response (Y1i,Y0i) are conditionally independent given propensity scores p(xi):
Y1i,Y0i ⊥di| xi     =  Y1i,Y0i ⊥ di|p(xi)  
In fact, propensity score matching can provide a more asymptotically efficient estimator of treatment effects than covariate matching (Angrist andHahn,2004).  
So the idea is to first generate propensity scores by specifying a model that predicts the probability of receiving treatment given covariates xi
p(xi)  = p(di=1|xi)
There are many possible functional forms for estimating propensity scores. Logit and probit models with the binary treatment indicator as the dependent variable are commonly used. Hirano et. al find that an efficient estimator can be achieved by weighting by a non-parametrically estimated propensity score (Hirano, et al, 2003). Millimet and Tchernis find evidence that more flexible and over specified estimators perform better in propensity score applications (Millimet and Tchernis , 2009). A comparative study of propensity score estimators using logistic regression, support vector machines, decision trees, and boosting algorithms can be found in Westreich et al (Westreich et al , 2009).
Once these probabilities, or ‘propensity scores’ are generated for each individual, matching is accomplished by identifying individuals in the control group with propensity scores similar to those in the treated group. Types of matching algorithms include 1:1 and nearest neighbor methods.  Differences between matched cases are calculated and then combined to estimate an average treatment effect.  Another method that implements matching based on propensity scores includes stratified comparisons. In this case treatment and control groups are stratified or divided into groups or categories  or bins of propensity scores. Then comparisons are made across strata and combined to estimate an average treatment effect. Matched comparisons based on propensity score strata  are discussed in Rosenbaum and Rubin (1984). This method can remove up to 90% of bias due to factors related to selection using as few as five strata (Rosenbaum and Rubin, 1984).
 Angrist, J. D., &  Hahn, J. (2004). When to control for covariates? Panel-Asymptotic Results for  Estimates of Treatment Effects. Review of Economics and Statistics. 86, 58-72.

Angrist, J. D. &  Pischke J. (2008). Mostly harmless econometrics: An empiricist's companion. Princeton  University Press.

Hirano, K. & Imbens, G.W. &  Ridder, G. (2003). Efficient estimation of average treatment effects  using the estimated propensity score. Econometrica, Vol. 71, No. 4, 1161–1189.
Klaiber, H.A. & Smith,V.K. (2009). Evaluating Rubin's causal model for measuring the capitalization of  environmental amenities.  NBER Working Paper No 14957. National Bureau of Economic Research.

Imbens, G. W. & Wooldridge, J.M.(2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature, 47:1, 5–86

Millimet , D. L. & Tchernis, R.(2009). On the specification of propensity scores, with applications to the  analysis of trade policies. Journal of Business & Economic Statistics, Vol. 27, No. 3

Rosenbaum , R. &. Rubin, D.B.(1983). The central role of the propensity score in observational studies  for causal effects.  Biometrika, Vol. 70, No. 1, pp. 41-55

Rosenbaum , R. &. Rubin, D.B.(1984). Reducing Bias in Observational Studies Using Sub classification   on the Propensity Score.  Journal of the American Statistical Association, Vol. 79, Issue. 387,  pp.516-524

Rubin, D. B.(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, Vol 66(5), Oct 1974, 688-701

Rubin, Donald B. (1973). Matching to remove bias in observational studies. Biometrics, 29, 159-83.

Thoemmes, F. J. & Kim, E. S. (2011). A systematic review of propensity score methods in the social  sciences. Multivariate Behavioral Research, 46(1), 90-118.

Westreich,  D. , Justin L., & Funk, M.J. (2010). Propensity score estimation: machine learning and classification methods as alternatives to logistic regression. Journal of  Clinical    Epidemiology,   63(8): 826–833.

Tuesday, September 3, 2013

GMM, Endogeneity, SNA, Viral Marketing, and Causal Inference

 In the article Impact of social network structure on content propagation: A study using YouTube data” the authors investigate the relationship between socioemetric measures like degree centrality with diffusion of videos across a network.  In other words, they wanted to know if there was a causal relationship between network properties of those that share videos and the likelihood that a video would become viral.  What first interested me about this article was that it was a very good example of an application of social network analysis and viral seeding.  However, it also provides some very good examples of applications related to generalized method of moments,  instrumental variables, unobserved heterogeneity and endogeneity, and causal inference.  I previously was not aware of the GMM style of dynamic panel data models that instrument with lags, which is apparently quite popular in many econometric applications (see references below).

As the authors point out, any model that relates network properties to the outcome of video dissemination requires a careful estimation strategy if we are interested in making causal inferences. They identity several sources of endogeneity and unobserved heterogeneity.  If we are trying to infer dissemination based on one’s position in the network, we have to consider that other unobserved factors related to network position and video type could also impact dissemination.  It may be the case that all we are trying to do is predict video shares based on network position,  and perhaps that is OK as long as these correlations hold over time.

In contrast, if we want to make causal inferences, these types of endogeneity must be accounted for and also make econometric estimation difficult. In this case what we really want to estimate is the independent causal effect of network position on video shares, so we are interested only in the ‘quasi-experimental’ variation in network position.

A natural solution involves an instrumental variables approach, but the challenge of finding an ‘external’ instrument that is correlated with network and video properties of interest, but uncorrelated with unobserved effects is rather daunting. Ultimately the authors propose a generalized method of moments dynamic panel estimator using lagged variables as instruments.  


 Anderson, T. W., & Hsaio, C. (1981). Estimation of dynamic models with error components. Journal of the American Statistical Association, 76(375), 598–606.

Arellano, M., & Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies, 58, 277–97.

Stephen Bond
cemmap working paper CWP09/02

Impact of social network structure on content
propagation: A study using YouTube data
Quant Mark Econ (2012) 10:111150
Hema Yoganarasimhan