I recently ran across:
The State of Applied Econometrics - Causality and Policy Evaluation
Susan Athey, Guido Imbens
A nice read, although I skipped directly to the section on machine learning. A few interesting causality/machine learning comments.
They discussed some known issues related to estimating propensity scores using various machine learning algorithms in terms of the sensitivity of results, especially for propensity scores close to 0 or 1. They discuss trimming weights as one possible approach, which I have heard before in Angrist and Pischke and other work (see below). In fact, in a working paper where I employed gradient boosting to estimate propensity scores for IPTW regression, I trimmed weights. However, I did not trim them for the stratified matching estimator that I also used. I wish I still had the data because I would like to see the impact on my previous results.
Another interesting application discussed in this paper was a two (or 3?) stage LASSO estimation (they actually have a great overall discussion of penalized regression and regularization in machine learning) where they mention first running LASSO to select variables related to the outcome of interest, second running LASSO to select for variables related to selection, and finally running OLS to estimate a causal model that includes the selected variables from the previous LASSO methods.
The paper covers a range of other topics including decision trees, random forests, distinctions between traditional econometrics and machine learning, instrumental variables etc.
Some Additional Notes and References:
Multiple Algorithms (CART/Logistic Regression/Boosting/Random Forests) with PS weights and trimming:
Following Angrist and Pischke I present results for regressions utilizing data that has been 'screened' by eliminating observations where ps > .90 or < .10 using the r 'matchit' package