A while back I stumbled across a paper by Liu & Lynch: Do Agricultural Land Preservation Programs Reduce Farmland Loss?
They use a really long panel and propensity score matching and highlight some important considerations in propensity score matching applications:
1) They used unrestricted matching (which basically ignored the time component, allowed an individual to actually be matched to themselves if propensity scores across time periods matched up) and restricted matching which required matches to be made only between treatments and controls within a given time period/census period.
2) They provide a interesting discussion of the variance/bias tradeoff associated with bandwidth and kernel selection:
“Bandwidth and kernel type selection is an important issue in choosing matching method. Generally speaking, a large bandwidth leads to a larger bias but smaller variance of the estimated average treatment effect of the PDR programs; a small bandwidth leads to a smaller bias but a larger variance. The differences among kernel types are embedded in the weights they assign to non-PDR county observations whose estimated propensity score are farther away from that of their matched PDR county observations.”
3) They discussed the use of a leave one out cross-validation mechanism to choose the ‘best matching’ method (combination of matching method i.e. nearest neighbor, kernel, local linear & combination with 5 possible kernel types and 6 bandwidths) optimized based on MSE criteria. They site a references for this:
Racine, J. S. and Q. Li. 2004. “Nonparametric estimation of regression functions with both categorical and continuous data.” Journal of Econometrics 119 (1): 99-130.
Black, D., and J. Smith. 2004. “How Robust is the Evidence on the Effects of College Quality? Evidence from Matching.” Journal of Econometrics, 121(1-2): 99-124.
4) They state: “Matching with replacement performs as well or better than matching without replacement”– based on:
Dehejia, R., and S. Wahba. 2002. “Propensity score matching methods for non-experimental causal studies.” The Review of Economics and Statistics 84: 151-161.
Rosenbaum, P. 2002. Observational Studies (2nd edition). New York: Springer Verlag.
5) They also write that the “selection of matching methods depends on the distribution of the estimated propensity score”- i.e.
Kernel Matching works well with asymmetric distributions, excludes bad matches
The Local Linear Estimator may be more efficient than standard kernel matching when there is a large concentration of observations with propensity scores near 1 or 0:
Also from: McMillen, D. P., and J.F. McDonald. 2002. “Land values in a newly zoned city.” Review of Economics and Statistics 84(1): 62–72.
They also discuss the fact that nearest neighbor matching is more biased if propensity score distributions are not very compatible.
6) Balancing Tests – A lot of practitioners implement more subjective evaluations of balance based on data visualization, but in this paper formal tests are discussed.
“After matching, we check again whether the two matched groups are the same on their observed characteristics. If unbalanced, the estimated ATT may not be solely the impact of PDR programs. Instead, it may be a combination of the impacts of PDR programs and the unbalanced variables. We rely on two of the balancing tests that exist in the empirical literature: the standardized difference test and a regression-based test. The first method is a t-test for the equality of the means for each covariate in the matched PDR and non-PDR counties. The regression test estimates coefficients for each covariate on polynomials of the estimated propensity scores…. and the interaction of these polynomials with the treatment binary variable, ….If the estimated coefficients on the interacted terms are jointly equal to zero according to an F-test, the balancing condition is satisfied.”
7) They also offer some really nice explanations of how kernel matching works:
“Kernel matching and local linear matching techniques match each PDR county with all non-PDR counties whose estimated propensity scores fall within a specified bandwidth (Heckman, Ichimura and Todd, 1997). The bandwidth is centered on the estimated propensity score for the PDR county. The matched non-PDR counties are weighted according to the density function of the kernel types.11 The closer a non-PDR county’s estimated propensity score is to the matched PDR county’s propensity score, the more similar the non-PDR county is to the matched PDR county and therefore it is assigned a larger weight calculated from a kernel functions defined in each method. More non-PDR counties are utilized under the kernel and local linear matching as compared to nearest neighbor matching.”