Sunday, February 24, 2019

The Multiplicity of Data Science

There was a really good article on LinkedIn some time ago regarding how Airbnb classifieds its data science roles:

"The Analytics track is ideal for those who are skilled at asking a great question, exploring cuts of the data in a revealing way, automating analysis through dashboards and visualizations, and driving changes in the business as a result of recommendations. The Algorithms track would be the home for those with expertise in machine learning, passionate about creating business value by infusing data in our product and processes. And the Inference track would be perfect for our statisticians, economists, and social scientists using statistics to improve our decision making and measure the impact of our work."

I think this helps tremendously to clarify thinking in this space.

Sunday, February 17, 2019

Was It Meant to Be? OR Sometimes Playing Match Maker Can Be a Bad Idea: Matching with Difference-in-Differences

Previously I discussed the unique aspects of modeling claims and addressing those with generalized linear models. I followed that with a discussion of the challenges of using difference-in-differences in the context of GLM models and some ways to deal with this. In this post I want to dig into into what some folks are debating in terms of issues related to combining matching with DID. Laura Hatfield covers it well on twitter:


Also, they picked up on this it at the incidental economist and gave a good summary of the key papers here.

You can find citations for the relevant papers below. I won't plagerize what both Laura and the folks at the Incidental Economist have already explained very well. But, at a risk of oversimplifying the big picture I'll try to summarize a bit. Matching in a few special cases can improve the precision of the estimate in a DID framework,  and occasionally reduces bias. Remember, that  matching on pre-period observables is not necessary for the validity of difference in difference models. There are  cases when the treatment group is in fact determined by pre-period outcome levels. In these cases matching is necessary. At other times, if not careful, matching in DID introduces risks for regression to the mean…what Laura Hatfield describes as a ‘bounce back’ effect in the post period that can generate or inflate treatment effects when they do not really exist.

Both the previous discussion on DID in a GLM context and combining matching with DID indicate the risks involved in just plug and play causal inference and the challenges of bridging the gap between theory and application.


Daw, J. R. and Hatfield, L. A. (2018), Matching and Regression to the Mean in Difference‐in‐Differences Analysis. Health Serv Res, 53: 4138-4156. doi:10.1111/1475-6773.12993

Daw, J. R. and Hatfield, L. A. (2018), Matching in Difference‐in‐Differences: between a Rock and a Hard Place. Health Serv Res, 53: 4111-4117. doi:10.1111/1475-6773.13017