Econometric Sense: November 2021

In a previous post I noted:

" ...correlations or 'flags' from big data might not 'identify' causal effects, but they are useful for prediction and might point us in directions where we can more rigorously investigate causal relationships"

Recently on LinkedIn I discussed situations where we have to be careful about taking action on specific features in a correlational model, for instance changing product attributes or designing an intervention based on interpretations of SHAP values from non-causal predictive models. I quoted Scott Lundberg:

"regularized machine learning models like XGBoost will tend to build the most parsimonious models that predict the best with the fewest features necessary (which is often something we strive for). This property often leads them to select features that are surrogates for multiple causal drivers which is "very useful for generating robust predictions...but not good for understanding which features we should manipulate to increase retention."

So sometimes, we may go into a project with the intention of only needing predictions. We might just want to target offers or nudges to customers or product users but not think about this in causal terms at first. But, as I have discussed before the conversation often inevitably turns to causality, even if stakeholders and business users don't use causal language to describe their problems.

"Once armed with predictions, businesses will start to ask questions about 'why'... they will want to know what decisions or factors are moving the needle on revenue or customer satisfaction and engagement or improved efficiencies...There is a significant difference between understanding what drivers correlate with or 'predict' the outcome of interest and what is actually driving the outcome."

This would seem to call for causal models. However, in their recent paper Carlos Fernández-Loría and Foster Provost make an exciting claim:

“what might traditionally be considered “good” estimates of causal effects are not necessary to make good causal decisions…implications above are quite important in practice, because acquiring data to estimate causal effects accurately is often complicated and expensive. Empirically, we see that results can be considerably better when modeling intervention decisions rather than causal effects.”

Now in this case they are not talking about causal models related to identifying key drivers of an outcome, so it is not contradicting anything mentioned above or in previous posts. Particularly they are talking about building models for causal decision making (CDM) that are simply focused on making decisions about who to 'treat' or target. In this particular scenario businesses are leveraging predictive models to target offers, provide incentives, or make recommendations. As discussed in the paper, there are two broad ways of approaching this problem. Let's say the problem is related to churn.

1) We could predict risk of churn and target members most likely to churn. We could do this with a purely correlational machine learning model. The output or estimand from this model is a predicted probability p() or risk score. They also refer to these kinds of models as 'outcome' models

2) We could build a causal model, that predicts causal impact of an outreach. This would allow us to target customers that we can most likely 'save' as a result of our intervention. They refer to this estimand as a causal effect estimate CEE. Building machine learning models that are causal can be more challenging and resource intensive.

It is true at the end of the day we want to maximize our impact. But the causal decision is ultimately who do we target in order to maximize impact. They point out this causal decision does not necessarily hinge on how accurate our point estimate is related to causal impact as long as errors in prediction still lead to the same decisions about who to target.

What they find is that in order to make good causal decisions about who to 'treat' we don't have to have super accurate estimates of the causal impact of treatment (or models focused on CEE). In fact they talk through scenarios and conditions where outcome models like #1 above that are non-causal, can perform just as well or sometimes better than more accurate causal models focusing on CEE.

In other words, correlational outcome models (like #1) can essentially serve as proxies for the more complicated causal models (like #2), even if the data used to estimate these 'proxy' models is confounded.

Scenarios where this is most likely include:

1) Outcomes used as proxies and (causal)effects are correlated

2) Outcomes used as proxies are easier to estimate than causal effects

3) Predictions are used to rank individuals

They also give some reasons why this may be true. Biased non-causal models built on confounded data may not be able to identify true causal effects, but still be useful for identifying the optimal decision.

"This could occur when confounding is stronger for individuals with large effects - for example if confounding bias is stronger for 'likely' buyers, but the effect of adds is also stronger for them...the key insight here is that optimizing to make the correct decision generally involves understanding whether a causal effect is above or below a given threshold, which is different from optimizing to reduce the magnitude of bias in a causal effect estimate."

"Models trained with confounded data may lead to decisions that are as good (or better) than the decisions made with models trained with costly experimental data, in particular when larger causal effects are more likely to be overestimated or when variance reduction benefits of more and cheaper data outweigh the detrimental effect of confounding....issues that make it impossible to estimate causal effects accurately do not necessarily keep us from using the data to make accurate intervention decisions."

Their arguments hinge on the idea that what we are really solving for in these decisions is based on ranking:

"Assuming...the selection mechanism producing the confounding is a function of the causal effect - so that the larger the causal effect the stronger the selection-then (intuitively) the ranking of the preferred treatment alternatives should be preserved in the confounded setting, allowing for optimal treatment assignment policies from data."

A lot of this really comes down to proper problem framing and appealing to the popular paraphrasing of George E. P. Box - all models are wrong, but some are useful. It turns out in this particular use case non-causal models can be as useful or more useful than causal ones.

And we do need to be careful about the nuance of the problem framing. As the authors point out, this solves one particular business problem and use case, but does not answer some of the most important causal questions businesses may be interested in:

"This does not imply that firms should stop investing in randomized experiments or that causal effect estimation is not relevant for decision making. The argument here is that causal effect estimation is not necessary for doing effective treatment assignment."

They go on to argue that randomized tests and other causal methods are still core to understanding the effectiveness of interventions and strategies for improving effectiveness. Their use case begins and ends with what is just one step in the entire lifecycle of product development, deployment, and optimization. In their discussion of further work they suggest that:

"Decision makers could focus on running randomized experiments in parts of the feature space where confounding is particularly hurtful for decision making, resulting in higher returns on their experimentation budget."

This essentially parallels my previous discussion related to SHAP values. For a great reference for making practical business decisions about when this is worth the effort see the HBR article in the references discussing when to act on a correlation.

So some big takeaways are:

1) When building a model for purposes of causal decision making (CDM) even a biased model (non-causal) can perform as well or better than a causal model focused on CEE.

2) In many cases, even a predictive model that provides predicted probabilities or risk (as proxies for causal impact or CEE) can perform as well or better than causal models when the goal is CDM.

3) If the goal is to take action based on important features (i.e. SHAP values as discussed before) however, we still need to apply a causal framework and understanding the actual effectiveness of interventions may still require randomized tests or other methods of causal inference.

HT: This paper was previously discussed at Andrew Gelman's blog here: https://statmodeling.stat.columbia.edu/2021/11/01/how-different-are-causal-estimation-and-decision-making/

References:

Causal Decision Making and Causal Effect Estimation Are Not the Same... and Why It Matters. Carlos Fernández-Loría and Foster Provost. 2021. https://arxiv.org/abs/2104.04103

When to Act on a Correlation, and When Not To. David Ritter. Harvard Business Review. March 19, 2014.

Be Careful When Interpreting Predictive Models in Search of Causal Insights. Scott Lundberg. https://towardsdatascience.com/be-careful-when-interpreting-predictive-models-in-search-of-causal-insights-e68626e664b6

Additional Reading:

Laura B Balzer, Maya L Petersen, Invited Commentary: Machine Learning in Causal Inference—How Do I Love Thee? Let Me Count the Ways, American Journal of Epidemiology, Volume 190, Issue 8, August 2021, Pages 1483–1487, https://doi.org/10.1093/aje/kwab048

Petersen, M. L., & van der Laan, M. J. (2014). Causal models and learning from data: integrating causal modeling and statistical estimation. Epidemiology (Cambridge, Mass.), 25(3), 418–426. https://doi.org/10.1097/EDE.0000000000000078

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning. Numair Sani, Daniel Malinsky, Ilya Shpitser arXiv:2006.02482v3

Statistics is a Way of Thinking, Not a Toolbox

Big Data: Don't Throw the Baby Out with the Bathwater

Big Data: Causality and Local Expertise Are Key in Agronomic Applications

The Use of Knowledge in a Big Data Society