Thursday, November 4, 2021

Causal Decision Making with non-Causal Models

In a previous post I noted: 

" ...correlations or 'flags' from big data might not 'identify' causal effects, but they are useful for prediction and might point us in directions where we can more rigorously investigate causal relationships"

Recently on LinkedIn I discussed situations where we have to be careful about taking action on specific features in a correlational model, for instance changing product attributes or designing an intervention based on interpretations of SHAP values from non-causal predictive models. I quoted Scott Lundberg:

"regularized machine learning models like XGBoost will tend to build the most parsimonious models that predict the best with the fewest features necessary (which is often something we strive for). This property often leads them to select features that are surrogates for multiple causal drivers which is "very useful for generating robust predictions...but not good for understanding which features we should manipulate to increase retention."

So sometimes, we may go into a project with the intention of only needing predictions. We might just want to target offers or nudges to customers or product users but not think about this in causal terms at first. But, as I have discussed before the conversation often inevitably turns to causality, even if stakeholders and business users don't use causal language to describe their problems. 

"Once armed with predictions, businesses will start to ask questions about 'why'... they will want to know what decisions or factors are moving the needle on revenue or customer satisfaction and engagement or improved efficiencies...There is a significant difference between understanding what drivers correlate with or 'predict' the outcome of interest and what is actually driving the outcome."

This would seem to call for causal models. However, in their recent paper Carlos Fernández-Loría and Foster Provost make an exciting claim:

“what might traditionally be considered “good” estimates of causal effects are not necessary to make good causal decisions…implications above are quite important in practice, because acquiring data to estimate causal effects accurately is often complicated and expensive. Empirically, we see that results can be considerably better when modeling intervention decisions rather than causal effects.”

Now in this case they are not talking about causal models related to identifying key drivers of an outcome, so it is not contradicting anything mentioned above or in previous posts. Particularly they are talking about building models for causal decision making (CDM) that are simply focused on making decisions about who to 'treat' or target.  In this particular scenario businesses are leveraging predictive models to target offers, provide incentives, or make recommendations. As discussed in the paper, there are two broad ways of approaching this problem. Let's say the problem is related to churn.

1) We could predict risk of churn and target members most likely to churn. We could do this with a purely correlational machine learning model. The output or estimand from this model is a predicted probability p() or risk score. They also refer to these kinds of models as 'outcome' models

2) We could build a causal model, that predicts causal impact of an outreach. This would allow us to target customers that we can most likely 'save' as a result of our intervention. They refer to this estimand as a causal effect estimate CEE. Building machine learning models that are causal can be more challenging and resource intensive.

It is true at the end of the day we want to maximize our impact. But the causal decision is ultimately who do we target in order to maximize impact. They point out this causal decision does not necessarily hinge on how accurate our point estimate is related to causal impact as long as errors in prediction still lead to the same decisions about who to target.

What they find is that in order to make good causal decisions about who to 'treat' we don't have to have super accurate estimates of the causal impact of treatment (or models focused on CEE). In fact they talk through scenarios and conditions where outcome models like #1 above that are non-causal, can perform just as well or sometimes better than more accurate causal models focusing on CEE. 

In other words, correlational outcome models (like #1) can essentially serve as proxies for the more complicated causal models (like #2), even if the data used to estimate these 'proxy' models is confounded.

 Scenarios where this is most likely include:

1) Outcomes used as proxies and (causal)effects are correlated

2) Outcomes used as proxies are easier to estimate than causal effects

3) Predictions are used to rank individuals

They also give some reasons why this may be true. Biased non-causal models built on confounded data may not be able to identify true causal effects, but still be useful for identifying the optimal decision. 

"This could occur when confounding is stronger for individuals with large effects - for example if confounding bias is stronger for 'likely' buyers, but the effect of adds is also stronger for them...the key insight here is that optimizing to make the correct decision generally involves understanding whether a causal effect is above or below a given threshold, which is different from optimizing to reduce the magnitude of bias in a causal effect estimate."

"Models trained with confounded data may lead to decisions that are as good (or better) than the decisions made with models trained with costly experimental data, in particular when larger causal effects are more likely to be overestimated or when variance reduction benefits of more and cheaper data outweigh the detrimental effect of confounding....issues that make it impossible to estimate causal effects accurately do not necessarily keep us from using the data to make accurate intervention decisions."

Their arguments hinge on the idea that what we are really solving for in these decisions is based on ranking:

"Assuming...the selection mechanism producing the confounding is a function of the causal effect - so that the larger the causal effect the stronger the selection-then (intuitively) the ranking of the preferred treatment alternatives should be preserved in the confounded setting, allowing for optimal treatment assignment policies from data."

A lot of this really comes down to proper problem framing and appealing to the popular paraphrasing of George E. P. Box - all models are wrong, but some are useful. It turns out in this particular use case non-causal models can be as useful or more useful than causal ones.

And we do need to be careful about the nuance of the problem framing. As the authors point out, this solves one particular business problem and use case, but does not answer some of the most important causal questions businesses may be interested in:

"This does not imply that firms should stop investing in randomized experiments or that causal effect estimation is not relevant for decision making. The argument here is that causal effect estimation is not necessary for doing effective treatment assignment."

They go on to argue that randomized tests and other causal methods are still core to understanding the effectiveness of interventions and strategies for improving effectiveness. Their use case begins and ends with what is just one step in the entire lifecycle of product development, deployment, and optimization. In their discussion of further work they suggest that:

"Decision makers could focus on running randomized experiments in parts of the feature space where confounding is particularly hurtful for decision making, resulting in higher returns on their experimentation budget."

This essentially parallels my previous discussion related to SHAP values. For a great reference for making practical business decisions about when this is worth the effort see the HBR article in the references discussing when to act on a correlation.

So some big takeaways are:

1) When building a model for purposes of causal decision making (CDM) even a biased model (non-causal) can perform as well or better than a causal model focused on CEE.

2) In many cases, even a predictive model that provides predicted probabilities or risk (as proxies for causal impact or CEE) can perform as well or better than causal models when the goal is CDM.

3) If the goal is to take action based on important features (i.e. SHAP values as discussed before) however, we still need to apply a causal framework and understanding the actual effectiveness of interventions may still require randomized tests or other methods of causal inference.


Causal Decision Making and Causal Effect Estimation Are Not the Same... and Why It Matters. Carlos Fernández-Loría and Foster Provost. 2021.

When to Act on a Correlation, and When Not To. David Ritter. Harvard Business Review. March 19, 2014. 

Be Careful When Interpreting Predictive Models in Search of Causal Insights. Scott Lundberg.  

Additional Reading:

Laura B Balzer, Maya L Petersen, Invited Commentary: Machine Learning in Causal Inference—How Do I Love Thee? Let Me Count the Ways, American Journal of Epidemiology, Volume 190, Issue 8, August 2021, Pages 1483–1487,

Petersen, M. L., & van der Laan, M. J. (2014). Causal models and learning from data: integrating causal modeling and statistical estimation. Epidemiology (Cambridge, Mass.), 25(3), 418–426.

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning. Numair Sani, Daniel Malinsky, Ilya Shpitser arXiv:2006.02482v3  

Related Posts:

Will there be a credibility revolution in data science and AI? 

Statistics is a Way of Thinking, Not a Toolbox

Big Data: Don't Throw the Baby Out with the Bathwater

Big Data: Causality and Local Expertise Are Key in Agronomic Applications 

The Use of Knowledge in a Big Data Society

Wednesday, July 7, 2021

R.A Fisher, Big Data, and Pretended Knowledge

In Thinking Fast and Slow, Kahneman points out that what matters more than the quality of evidence, is the coherence of the story. In business and medicine, he notes that this kind of 'pretended' knowledge  based on coherence is often sought and preferred. We all know that no matter how great the analysis, if we can't explain and communicate the results with influence, our findings may go unappreciated.  But as we have learned from misinformation and disinformation about everything from vaccines to GMOs, Kahneman's insight is a double edge sword that cuts both ways. Coherent stories often win out over solid evidence and lead to making the wrong decision. We see this not only in science and politics, but also in business. 

In the book The Lady Tasting Tea by David Salsburg, we learn that R.A. Fisher was all too familiar with the pitfalls of attempting to innovate based on pretended knowledge and big data (excerpts):

"The Rothamsted Agricultural Experiment Station, where Fisher worked during the early years of the 20th century, had been experimenting with different fertilizer components for almost 90 years before he arrived...for 90 years the station ran experiments testing different combinations of mineral salts and different strains of wheat, rye, barley, and potatoes. This had created a huge storehouse of data, exact daily records of rainfall and temperature, weekly records of fertilizer dressings and measures of soil, and annual records of harvests - all of it preserved in leather bound notebooks. Most of the 'experiments' had not produced consistent results, but the notebooks had been carefully stored away in the stations archives....the result of these 90 years of 'experimentation' was a mess of confusion and vast troves of unpublished and useless data...the most that could be said of these [experiments] was that some of them worked sometimes, perhaps, or maybe."

Fisher introduced the world to experimental design and challenged the idea that scientists could make progress by tinkering alone. Instead, he motivated them to think through inferential questions- Is the difference in yield for variety A vs variety B (signal) due to superior genetics, or is this difference what we would expect to see anyway due to natural variation in crop yields (noise) i.e. in other words is the differences in yield statistically significant? This is the original intention of the concept of statistical significance that has gotten lost in the many abuses and misinterpretations often hear about. He also taught us to ask questions about causality, does variety A actually yield better than variety B because it is genetically superior, or could differences in yield be explained by differences in soil characteristics, weather/rainfall, planting date, or numerous other factors. His methods taught us how to separate the impact of a product or innovation from the impact and influences of other factors.

Fisher did more than provide a set of tools for problem solving. He introduced a structured way of thinking about real world problems and the data we have to solve them.  This way of thinking moved the agronomists at Rothamsted away from from mere observation to useful information. This applies not only to agriculture, but to all of the applied and social sciences as well as business.

In his book Uncontrolled, Jim Manzi stressed the importance of thinking like Fisher's plant breeders and agronomists (Fisher himself was a geneticist) especially in business settings. Manzi describes the concept of 'high causal density' which is the idea that often the number of causes of variation in outcomes can be enormous with each having the potential to wash out the cause we are most interested in (whatever treatment or intervention we are studying). In business, which is a social science, this becomes more challenging than in the physical and life sciences. In physics and biology we can assume relatively uniform physical and biological laws that hold across space and time. But in business the 'long chain of causation between action and outcome' is 'highly dependent for its effects on the social context in which it is executed.' It's all another way of saying that what happens in the outside world can often have a much larger impact on our outcome than a specific business decision, product, or intervention. As a result this calls for the same approach Fisher advocated in agriculture to be applied in business settings. 

List and Gneezy address this in The Why Axis:

"Many businesses experiment and often...businesses always tinker...and try new things...the problem is that businesses rarely conduct experiments that allow a comparison between a treatment and control group...Business experiments are research investigations that give companies the opportunity to get fast and accurate data regarding important decisions."

Fisher's approach soon caught on and revolutionized science and medicine, but in many cases is still lagging adoption in some business settings in the wake of big data, AI, and advances in machine learning. As Jim Manzi and Stefan Thomke state in Harvard Business Review in the absence of formal randomized testing and good experimental design:

"executives end up misinterpreting statistical noise as causation—and end up making bad decisions"

In The Book of Why, Judea Pearl laments the reluctance to embrace causality: 

"statistics, including many disciplines that looked to it for guidance remained in the prohibition era, falsely believing that the answers to all scientific questions reside in the data, to be unveiled through clever data mining tricks...much of this data centric history still haunts us today. We live in an era that presumes Big Data to be the solution to all of our problems. Courses in data science are proliferating in our universities, and jobs for data scientists are lucrative in companies that participate in the data economy. But I hope with this book to convince you that data are profoundly dumb...over and over again, in science and business we see situations where more data aren't enough. Most big data enthusiasts, while somewhat aware of those limitations, continue to chase after data centric intelligence."

These big data enthusiasts strike a strong resemblance to the researchers at Rothamsted before Fisher. List has a similar take:

"Big data is important, but it also suffers from big problems. The underlying approach relies heavily on correlations, not causality. As David Brooks has noted, 'A zillion things can correlate with each other depending on how you structure of the data and what you compare....because our work focuses on field experiments to infer causal relationships, and because we think hard about these causal relationships of interest before generating the data we go well beyond what big data could ever deliver."

We often want fast iterations and actionable insights from data. While it is true, a great analysis with no story delivered too late is as good as no analysis, it is just as true that quick insights with a coherent story based on pretended knowledge from big data can leave you running in circles getting nowhere - no matter how fast you might feel like you are running. In the case of Rothamsted, scientists ran in circles for 90 years before real insights could be uncovered using Fisher's more careful and thoughtful analysis. Even if they had today's modern tools of AI and ML and data visualization tools to cut the data 1000 different ways they still would not have been able to get any value for all of their effort. Wow, 90 years! How is that for time to insight? In many ways, despite drowning in data and the advances in AI and machine learning, many areas of business across a number of industries will find themselves in the same place Fisher found himself at Rothamsted almost 100 years ago. We will need a credibility revolution in AI to bring about the kind of culture change that will make the kind of causal and inferential thinking that comes natural to today's agronomists (thanks to Fisher) or more recently the way Pearl's disciples think about causal graphs more commonplace in business strategy. 


1) Randomized tests are not always the only way to make causal inferences. In fact in the Book of Why Pearl notes in relation to smoking and lung cancer, outside of the context of randomized controlled trials "millions of lives were lost or shortened because scientists did not have adequate language or methodology for answering causal questions." The credibility revolution in epidemiology and economics, along with Pearl's work has provided us with this language. As Pearl notes: "Nowadays, thanks to carefully crafted causal models, contemporary scientists can address problems that would have once been considered unsolvable or beyond the pale of scientific inquiry." See also: The Credibility Revolution(s) in Econometrics and Epidemiology.

2) Deaton and Cartwright make strong arguments challenging the supremacy of randomized tests as the gold standard for causality (similar to Pearl) but this only furthers the importance of considering careful causal questions in business and science by broadening the toolset along the same lines as Pearl. Deaton and Cartwright also emphasize the importance of interpreting causal evidence in the context of sound theory. See: Angus Deaton, Nancy Cartwright,Understanding and misunderstanding randomized controlled trials,Social Science & Medicine, Volume 210, 2018.

3) None of this is to say that predictive modeling and machine learning cannot answer questions and solve problems that create great value to business. The explosion of the field of data science is an obvious testament to this fact. Probably the most important thing in this regard is for data scientists and data science managers to become familiar with the important distinctions between models and approaches that explain or predict. See also: To Explain or Predict and Big Data: Don't Throw the Baby Out with the Bathwater

Additional Reading

Will there be a credibility revolution in data science and AI? 

Statistics is a Way of Thinking, Not a Toolbox 

The Value of Business Experiments and the Knowledge Problem

The Value of Business Experiments Part 2: A Behavioral Economic Perspective 

The Value of Business Experiments Part 3: Innovation, Strategy, and Alignment 

Big Data: Don't Throw the Baby Out with the Bathwater 

Big Data: Causality and Local Expertise Are Key in Agronomic Applications

The Use of Knowledge in a Big Data Society 

Wednesday, June 2, 2021

Science Communication for Business and Non-Technical Audiences: Stigmas, Strategies, and Tactics

If you are a reader of this blog you are familiar with the number of posts I have shared about machine learning and causal inference and the benefits of education in economics. I have also discussed how there are important gaps sometimes between theory and application. 

In this post I am going to talk about another important gap related to communication. How do we communicate the value of our work to a non-technical audience? 

We can learn a lot from formal coursework, especially in good applied programs with great professors. But if not careful we can pick up on mental models and habits of thinking that turn out to weigh us down too, particularly for those that end up working in very applied business or policy settings. How we deal with these issues becomes important to career professionals and critical to those involved in science communication in general whether we are trying to influence business decision makers, policy makers, or consumers and voters.

In this post I want to discuss communicating with intent, paradigm gaps, social harassment costs, and mental accounting.

As stated in The Analytics Lifecycle Toolkit: "no longer is it sufficient to give the technical answer, we must be able to communicate for both influence and change."

Communicating to Business and Non-Technical Audiences - or - The Laffer Curve for Science Communication

For those who plan to translate their science backgrounds to business audiences (like many data scientists coming from scientific backgrounds) what are some strategies for becoming better science communicators?  In their book Championing Science: Communicating your Ideas to Decision Makers Roger and Amy Aines offer lots of advice. You can listen to a discussion of some of this at the BioReport podcast here. 

Two important themes they discuss is the idea of paradigm gaps and intent. Scientists can be extremely efficient communicators through the lens of the paradigms they work in. 

As discussed in the podcast, a paradigm is all the knowledge a scientist or economist may have in their head specific to their field of study and research. Unfortunately there is a huge gap between this paradigm and its vocabulary and what non-technical stakeholders can relate to. They have to meet stakeholders where they are, vs. the audience they may find at conferences or research seminars. From experience, different stakeholders and audiences across different industries have different gaps. If you work for a consultancy with external pharma clients they might have a different expectation about statistical rigor than say a product manager in a retail setting. Even within the same business or organization, the tactics used in solving for the gap for one set of stakeholders might not work at all for a new set of stakeholders if you change departments. In other words, know your audience. What do they want or need or expect? What are their biases? What is their level of analytic or scientific literacy? How risk averse are they? Answers to these questions is a great place to start in terms of filling the paradigm gaps and to address the second point made in the podcast - speaking with intent.

As discussed in the podcast: "many scientists don't approach conversations or presentations with a real strategic intent in terms of what they are communicating...they don't think in terms of having a message....they need to elevate and think about the point they are trying to make when speaking to decision makers." 

As Bryan Caplan states in his book The Myth of the Rational Voter, when it comes to speaking to non-economists and the general public, they should apply the Laffer curve of learning, "they will retain less if you try to teach them more."

He goes on to discus that its not just what we say, but how we position it, especially when dealing with resistance related to misinformation and disinformation and systemic biases:

"irrationality is not a barrier to persuasion, but an invitation to alternative rhetorical techniques...if beliefs are in part consumed for their direct psychological benefits then to compete in the marketplace of ideas, you need to bundle them with the right emotional content."

In the Science Facts and Fallacies podcast (May 19, 2021) Kevin Folta and Cameron English discuss:

"We spend so much time trying to convince people with scientific's so important for us to remember what we learn from psychology and sociology (and economics) matters. These are turning out to be the most important sciences in terms of forming a conduit through which good science communication can flow."

Torsten Slok offers great advice in his discussion with Barry Ritholtz about working in the private sector as a PhD economist in the Masters in Business Podcast back in 2018: 

"there is a different sense of urgency and an emphasis on brevity....we offer a service of having a view on what the economy will do what the markets will do - lots of competition for attention...if you write long winded explanations that say that there is a 50/50 chance that something will happen many customers will not find that very helpful."

So there are a lot of great data science and science communicators out there with great advice. A big problem is this advice is often not part of the training that many of those with scientific or technical backgrounds receive, and an even bigger problem is that it is often looked down upon and even punished! I'll explain more below.

The Negative Stigma of Science Communication in the Data Science and Scientific Community

One of the most egregious things I see on social media is someone trying their best to help mentor those new to the analytical space (and improve their own communication skills) by sharing some post that attempts to describe some complicated statistical concept in 'layman's' terms - to only be rewarded by harassing and trolling comments. Usually this is about how they didn't capture every particular nuance of the theory, failed to include a statement about certain critical assumptions, or over simplified the complex thing they were trying to explain in simple terms to begin with. This kind of negative social harassment seems to be par for the course when attempting to communicate statistics and data science on social media like LinkedIn and Twitter.

Similarly in science communication, academics can be shunned by their peers when attempting to do popular writing or communication for the general public. 

In 'The Stoic Challenge' author William Irvine discusses Danial Kahneman's challenges with writing a popular book: 

"Kahneman was warned that writing a popular book would cause harm to his professional reputation...professors aren't supposed to write books that normal people can understand."

He describes, when Kahneman's book Thinking Fast and Slow made the New York Times best selling list Kahneman "sheepishly explained to his colleagues that the book's appearance there was a mistake."

In an EconTalk interview with economist Steven Levitt, Russ Roberts asks Levitt about writing his popular book Freakonomics:

"What was the reaction from your colleagues in the profession...You know, I have a similar route. I'm not as successful as you are, but I've popularized a lot of was considered somewhat untoward to waste your time speaking to a popular audience."

Levitt responded by saying the reaction was not so bad, but the fact that Russ had to broach the topic is evidence of the toxic culture that academics face when doing science communication. The negative stigma associated with good science communication is not limited to economics or the social and behavioral sciences. 

In his Talking Biotech podcast episode Debunking the Disinformation Dozen, scientist and science communicator Kevin Folta discusses his strident efforts facing off these toxic elements:

"I have always said that communication is such an important part of what we do as scientists but I have colleagues who say you are wasting your time doing this...Folta why are you wasting your time doing a podcast or writing scientific stuff for the public."

Some of this is just bad behavior, some of it is gatekeeping done in the name of upholding the scientific integrity of their field, some of it is the attempt of others to prove their competence to themselves or others, and maybe some of it is the result of people genuinely trying to provide peer review to their colleagues that they think have gone astray. But most of it is unhelpful when it comes to influencing decision makers or improving general scientific literacy. It doesn't matter how great the discovery, how impactful the findings, we have all seen from the pandemic that effective science communication is critical for overcoming the effects of misinformation and disinformation. A culture that is toxic toward effective science communication becomes an impediment to science itself and leaves a void waiting be filled by science deniers, activists, policy makers, decision makers, and special interests.

This can be challenging when you add the Dunning-Kruger effect to the equation. Those that know the least may be the most vocal while scientists and those with expertise sit on the sidelines. As Bryan Caplan states in his book The Myth of the Rational Voter:

"There are two kinds of errors to avoid. Hubris is one, self abasement is the other. The first leads experts to over reach themselves; the second leads experts to stand idly by while error reigns."

How Does Culture and Mental Accounting Impact Science Communication?

So as I've written above, in the scientific community there is sort of a toxic culture that inhibits good science communication. In the Two Psychologists Four Beers podcast behavioral scientist Nick Hobson makes an interesting comparison between MBAs and scientists. 

"as scientists we need to be humble with regards to our thing we are learning from our current woes of replication (the replication crisis) is we know a lot less than we think. This has conditioned us to be more humble....vs. business school people that are trained to be more assertive and confident."

I'd like to propose an analogy relating to mental accounting. It seems like when a scientist gets their degree it comes with a mental account called scientific credibility. Speaking and writing to a general audience risks taking a charge against that account, and they are trained to be extremely frugal about managing it. Communication becomes an exercise in risk management.  If they say or communicate something that is of the slightest error, missing the slightest nuance, a colleague may call them out.  Gotcha! Psychologically, this would call for a huge charge against their 'account' and reputation. Its not quite a career ending mistake like making a fraudulent claim or faking data, but it's bad enough to be avoided at great cost. MBAs don't have a mental account called scientific credibility. They aren't long on academic credibility so they don't require putting on the communication hedges the way scientists often do. They come off as better communicators and more confident while scientists risk becoming stereotyped as unable to be effective communicators. 

To protect their balance at all costs and avoid social harassment from their peers, economists and scientists may tend to speak with caveats, hedges, and qualifications. This may also mean a delayed response. Before even thinking about communicating results in many cases requires in depth rigorous analysis, sensitivity checks etc. It requires doing science which is by nature slow while the public wants answers fast. Faster answers might mean less time for analysis which calls for more caveats. This can all be detrimental to effective communication to non-technical audiences. Answers become either too slow or too vague to support decision making (recall Torsten Slok's comments above). It gives the impression of a lack of confidence and relevance and a stereotype that technical people (economists, scientists, data scientists etc.) fail to offer definitive or practical conclusions. As Bryan Caplan notes discussing the role of economists in The Myth of the Rational Voter:

"when the media spotlight gives other experts a few seconds to speak their mind, they usually strive to forcefully communicate one or two simplified conclusions....but economists are reluctant to use this strategy. Though the forum demands it they think it unseemly to express a definitive judgement. This is a recipe for being utterly ignored."

Students graduating from economics and science based graduate programs may inherit these mental accounts and learn these 'hedging strategies' from their professors, from the program, and the seminar culture that comes with it.

Again, Nick Hobson offers great insight about how to deal with this kind of mental accounting in his own work:

"what I've wrestled with as I've grown the business is maintaining scientific integrity and the rigor but knowing you have to sacrifice some of have to find and strike a balance between being data driven and humble while also being confident and strategic and cautious about the shortcuts you take."

In Thinking Fast and Slow, Kahneman argues that sometimes new leaders can produce better results because fresh thinkers can view problems without the same mental accounts holding back incumbents. The solution isn't to abandon scientific training and the value it brings to the table in terms of rigor and statistical and causal reasoning. The solution is to learn how to view problems in a way that avoids the kind of mental accounting I have been discussing. This also calls for a cultural change in the educational system. As Kevin Folta stated in the previous Talking Biotech Podcast:

"Until we have a change in how the universities and how the scientific establishment sees these efforts as positive and helpful and counts toward tenure and promotion I don't think you are going to see people jump in on this." 

Given graduate and PhD training may come with such baggage, one alternative may be to develop programs with more balance, like Professional Science Master's degrees or at least create courses or certificates that focus on translational knowledge and communication skills. Or seek out graduate study under folks like Dr. Folta who are great scientists and researchers that can also help you overcome the barriers to communicate science effectively. If that is the case we are going to need more Dr. Folta's.


The Myth of the Rational Voter: Why Democracies Choose Bad Policies. Bryan Caplan. Princeton University Press. 2007.

The stoic challenge : a philosopher's guide to becoming tougher, calmer, and more resilient. William Braxton Irvine. Norton & Co. NY. 2019

The Analytics Lifecycle Toolkit: A Practical Guide for an Effective Analytics Capability. Gregory S. Nelson. 2018.

Thursday, April 1, 2021

The Value of Business Experiments Part 3: Innovation, Strategy, and Alignment

In previous posts I have discussed the value proposition of business experiments from both a classical and behavioral economic perspective. This series of posts has been greatly influenced by Jim Manzi's book 'Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society.' Midway through the book Manzi highlights three important things that experiments in business can do:

1) They provide precision around the tactical implementation of strategy
2) They provide feedback on the performance of a strategy which allows for refinements to be driven by evidence
3) They help achieve organizational and strategic alignment

Manzi explains that within any corporation there are always silos and subcultures advocating competing strategies with perverse incentives and agendas in pursuit of power and control. How do we know who is right and which programs or ideas are successful considering the many factors that could be influencing any outcome of interest? Manzi describes any environment where the number of causes of variation are enormous as an environment that has 'high causal density.' We can claim to address this with a data driven culture, but what does that mean? Modern companies in a digital age with AI and big data are drowning in data. This makes it easy to adorn rhetoric in advanced analytical frameworks. Because data seldom speaks, anyone can speak for the data through wily data story telling.

As Jim Manzi and Stefan Thomke discuss in Harvard Business Review:

"business experiments can allow companies to look beyond correlation and investigate causality....Without it, executives have only a fragmentary understanding of their businesses, and the decisions they make can easily backfire."

In complex environments with high causal density, we don't know enough about the nature and causes of human behavior, decisions, and causal paths from actions to outcomes to list them all and measure and account for them even if we could agree how to measure them. This is the nature of decision making under uncertainty. But, as R.A. Fisher taught us with his agricultural experiments, randomized tests allow us to account for all of these hidden factors (Manzi calls them hidden conditionals). Only then does our data stand a chance to speak truth.

In Dual Transformation: How to Reposition Today's Business While Creating the Future authors discuss the importance of experimentation as a way to navigate uncertainty in causally dense environments in what they refer to as transformation B:

“Whenever you innovate, you can never be sure about the assumptions on which your business rests. So, like a good scientist, you start with a hypothesis, then design and experiment. Make sure the experiment has clear objectives (why are you running it and what do you hope to learn). Even if you have no idea what the right answer is, make a prediction. Finally, execute in such a way that you can measure the prediction, such as running a so-called A/B test in which you vary a single factor."

Experiments aren't just tinkering and trying new things. While these are helpful to innovation, just tinkering and measuring and observing still leaves you speculating about what really works and is subject to all the same behavioral biases and pitfalls of big data previously discussed.

List and Gneezy address this in The Why Axis:

"Many businesses experiment and often...businesses always tinker...and try new things...the problem is that businesses rarely conduct experiments that allow a comparison between a treatment and control group...Business experiments are research investigations that give companies the opportunity to get fast and accurate data regarding important decisions."

Three things distinguish a successful business experiment from just tinkering:

1) Separation of signal from noise through well designed and sufficiently powered tests
2) Connecting cause and effect through randomization 
3) Clear signals on business value that follows from 1 & 2 above

Having causal knowledge helps identify more informed and calculated risks vs. risks taken on the basis of gut instinct, political motivation, or overly optimistic data-driven correlational pattern finding analytics. 

Experiments add incremental knowledge and value to business. No single experiment is going to be a 'killer app' that by itself will generate millions in profits. But in aggregate the knowledge created by experiments probably offers the greatest strategic value across an enterprise compared to any other analytic method.

As discussed earlier, business experiments create value by helping manage the knowledge problem within firms, it's worth repeating again from List and Gneezy:

"We think that businesses that don't experiment and fail to show, through hard data, that their ideas can actually work before the company takes action - are wasting their money....every day they set suboptimal prices, place adds that do not work, or use ineffective incentive schemes for their work force, they effectively leave millions of dollars on the table."

As Luke Froeb writes in Managerial Economics, A Problem Solving Approach (3rd Edition):

"With the benefit of hindsight, it is easy to identify successful strategies (and the reasons for their success) or failed strategies (and the reason for their failures). It's much more difficult to identify successful or failed strategies before they succeed or fail."

Again from Dual Transformation:

"Explorers recognize they can't know the right answer, so they want to invest as little as possible in learning which of their hypotheses are right and which ones are wrong"

Business experiments offer the opportunity to test strategies early on a smaller scale to get causal feedback about potential success or failure before fully committing large amounts of irrecoverable resources. This takes the concept of failing fast to a whole new level. As discussed in The Why Axis and Uncontrolled, business experiments play a central role in product development and innovation across a range of industries and companies from Harrah's casinos, Capital One, and Humana who have been leading in this area for decades to new ventures like Amazon and Uber. 

"At Uber Labs, we apply behavioral science insights and methodologies to help product teams improve the Uber customer experience. One of the most exciting areas we’ve been working on is causal inference, a category of statistical methods that is commonly used in behavioral science research to understand the causes behind the results we see from experiments or observations...Teams across Uber apply causal inference methods that enable us to bring richer insights to operations analysis, product development, and other areas critical to improving the user experience on our platform." - From: Using Causal Inference to Improve the Uber User Experience (link)

Achieving the greatest value from business experiments requires leadership commitment.  It also demands a culture that is genuinely open to learning through a blend of trial and error, data driven decision making informed by theory, and the infrastructure necessary for implementing enough tests and iterations to generate the knowledge necessary for rapid learning and innovation. The result is a corporate culture that allows an organization to formulate, implement, and modify strategy faster and more tactfully than others.

See also:
The Value of Business Experiments: The Knowledge Problem
The Value of Business Experiments Part 2: A Behavioral Economics Perspective
Statistics is a Way of Thinking, Not a Box of Tools


Saturday, March 13, 2021

Why Study Economics/Applied Economics?

Applied Economics is a broad field with many applications.

Applied Economics is a broad field of study covering many topics. Recognizing the wide range of applications has led departments of Agricultural Economics across numerous universities to change their degree program names to Applied Economics.  In 2008, the American Agricultural Economics Association changed its name to the Agricultural and Applied Economics Association (AAEA).

This trend is noted in research published in the journal Applied Economic Perspectives and Policy:

"Increased work in areas such as agribusiness, rural development, and environmental economics is making it more difficult to maintain one umbrella organization or to use the title “agricultural economist” ... the number of departments named" Agricultural Economics” has fallen from 36 in 1956 to 9 in 2007."

This brief podcast from the University of Minnesota's Department of Applied Economics is an example of this trend: 

It discusses the breadth of questions and problems applied economists address in their work including obesity and food systems, environmental and water resource economics, development, growth, trade, and technological change; public sector economics, health policy and management, human resources and industrial relations. Applied research in this area is often interdisciplinary including biology, engineering, health and animal sciences, and nutrition as an example. 

Why study applied economics?  A few inspiring quotes from Southern Illinois University introduction to their programs in Agribusiness Economics:

If you want to prove sustainable resource use saves money and protects the land…
If you understand that the wheat crop here can make a difference for a hungry child across the ocean…  

Applied Economics emphasizes quantitative and analytics skills ideal for careers in data science

Many applied economics master's degrees are designed to serve as a very attractive terminal degree for professionals. 

To quote, from Johns Hopkins University’s Applied Economics program home page:

“Economic analysis is no longer relegated to academicians and a small number of PhD-trained specialists. Instead, economics has become an increasingly ubiquitous as well as rapidly changing line of inquiry that requires people who are skilled in analyzing and interpreting economic data, and then using it to effect decisions ………Advances in computing and the greater availability of timely data through the Internet have created an arena which demands skilled statistical analysis, guided by economic reasoning and modeling.”

Many applied economics programs are STEM designated programs reflecting the emphasis that applied economics places on quantitative and analytics skills. The University of Pittsburg has designed their STEM designated M.S. in Quantitative Economics specifically with data science roles in mind. Virginia Tech offers an online Master of Ag and Applied Economics, the first I have seen in an Agricultural and Applied Economics department specifically designed to incorporate economics with data science and programming.

The focus on causality differentiates economics from other fields.

Once armed with predictions from machine learning an AI, businesses will start to ask questions about  what decisions or factors are moving the needle on revenue or customer satisfaction and engagement or improved efficiencies. Essentially they will want to ask questions related to causality, which requires a completely different paradigm for data analysis.

In a KDnuggets interview, Economist Scott Nicholson (Chief Data Scientist at Accretive Health and formerly at LinkedIn) comments on the differences between economists and data scientists: 

 "In terms of applied work, economists are primarily concerned with establishing causation. This is key to understanding what influences individual decision-making, how certain economic and public policies impact the world, and tells a much clearer story of the effects of incentives. With this in mind, economists care much less about the accuracy of the predictions from their econometric models than they do about properly estimating the coefficients, which gets them closer to understanding causal effects. At Strata NYC 2011, I summed this up by saying: If you care about prediction, think like a computer scientist, if you care about causality, think like an economist."

As data science thought leader Eugene Dubossarsky puts it in a SuperDataScience podcast:

“the most elite skills…the things that I find in the most elite data scientists are the sorts of things econometricians these days have…bayesian statistics…inferring causality” 

Nobel Prize Laureate Joshua Angrist discussed the new opportunities for students graduating with economics and quantitative skills that are available at firms like Amazon because of their interest in causal questions and running experiments:


In another interview Angrist emphasizes opportunities for Economics bachelor's degree holders:

"There's a very strong private sector market for economics undergrad especially economics undergrads who have good training in Amazon and Google and Facebook and Trip Adviser they are looking for people that can do some statistics but a lot of the questions that they are interested in are causal questions. What will be the consequences of changing prices for example or changing marketing strategies and these companies have discovered that the best training for that is undergrad work in economics or econometrics. We really specialize in causality in a way regular data science does not.....someone who trains in data science might learn a lot about machine learning but won't necessarily learn about for example instrumental variables or regression discontinuity methods and those turn out to be very useful for the tech sector."

A post at the Uber Engineering blog explains how they find these skills to be valuable in a business setting: 

"One of the most exciting areas we’ve been working on is causal inference, a category of statistical methods that is commonly used in behavioral science research to understand the causes behind the results we see from experiments or observations...causal inference helps us provide a better user experience for customers on the Uber platform. The insights from causal inference can help identify customer pain points, inform product development, and provide a more personalized experience...At a higher level, causal inference provides information that is critical to both improving the user experience and making business decisions through better understanding the impact of key initiatives."

Economics provides a foundation with long lasting value and offers a bright future.

Economics combines mathematically precise theories (like microeconomics) and empirically sound methods (like econometrics) to study people's choices and how they are made compatible. As a social and behavioral science and a quantitative and technical field, learning to think like an economist and applying those skills will never go out of fashion. There are a number of both undergraduate and graduate degree programs in economics and applied economics across the country and I would encourage you to check them out. I've listed a few more examples of applied economics programs below.

***This post is an update to an original post made in September 2010 found here.
Related Posts: 

Economists as Data Scientists   


'What is the Future of Agricultural Economics Departments and the Agricultural and Applied Economics Association?' By Gregory M. Perry. Applied Economic Perspectives and Policy (2010) volume 32, number 1, pp. 117–134.

Additional Graduate Programs in Applied Economics and Related Fields

Western Kentucky University - M.A. in Applied Economics (Also UG and GR options in Agriculture and Food Science
Murray State University - M.S. Agriculture/Agribusiness Economics 
Virginia Tech - M.S. Ag and Applied Economics
University of Cincinnati - M.S. Applied Economics
Clemson University - M.S. Applied Economics and Statistics
Montana State University - M.S. Applied Economics
Cornell University - M.S. & M.P.S. in Applied Economics and Management 
Oklahoma State University - M.S. Agricultural Economics  and MAg in Agribusiness
Texas A&M - M.S. in Agricultural Economics
North Dakota State University - M.S. Agribusiness and Applied Economics
University of Illinois - M.S. Agricultural and Applied Economics
University of Missouri - Agricultural and Applied Economics
Auburn University - M.S. Agricultural Economics and Rural Sociology (various programs)
AAEA  - Directory of additional programs at the graduate and undergraduate levels

Wednesday, September 30, 2020

Calibration, Discrimination, and Ethics

Classification models with binary and categorical outcomes are often assessed based on the c-statistic or area under the ROC curve. (see also:

This metric ranges between 0 and 1 and provides a summary of model performance in terms of its ability to rank observations. For example, if a model is developed to predict the probability of default, the area under the ROC curve can be interpreted as the probability that a randomly chosen observation from the observed default class will be ranked higher (based on model predictions or probability) than a chosen observation from the observed non-default class (Provost and Fawcett, 2013). This metric is not without criticism and should not be used as the exclusive criteria for model assessment in all cases. As argued by Cook (2017):

'When the goal of a predictive model is to categorize individuals into risk strata, the assessment of such models should be based on how well they achieve this aim...The use of a single, somewhat insensitive, measure of model fit such as the c statistic can erroneously eliminate important clinical risk predictors for consideration in scoring algorithms'

Calibration is an alternative metric for model assessment. Calibration measures the agreement between observed and predicted risk or the closeness of model predicted probability to the underlying probability of the population under study. Both discrimination and calibration are included in the National Quality Forum’s Measure of Evaluation Criteria. However, many have noted that calibration is largely underutilized by practitioners in the data science and predictive modeling communities (Walsh et al., 2017; Van Calster et al., 2019). Models that perform well on the basis of discrimination (area under the ROC) may not perform well based on calibration (Cook,2017). And in fact a model with lower ROC scores could actually calibrate better than a model with higher ROC scores (Van Calster et al., 2019). This can lead to ethical concerns as lack of calibration in predictive models can in application result in decisions that lead to over or under utilization of resources (Van Calster et al, 2019).

Others have argued there are ethical considerations as well:

“Rigorous calibration of prediction is important for model optimization, but also ultimately crucial for medical ethics. Finally, the amelioration and evolution of ML methodology is about more than just technical issues: it will require vigilance for our own human biases that makes us see only what we want to see, and keep us from thinking critically and acting consistently.” (Levy, 2020)

Van Calster et al. (2019), Colin et al. (2017) and Steyerberg et al. (2010) provide guidance on ways of assessing model calibration.

Frank Harrel provides a great discussion about choosing the correct metrics for model assessment along with a wealth of resources here.


Matrix of Confusion. Drew Griffin Levy, PhD. GoodScience, Inc.  Accessed 9/22/2020

Nancy R. Cook, Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction. Circulation. 2007; 115: 928-935

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. Tom Fawcett.O’Reilly. CA. 2013.

Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128-138. doi:10.1097/EDE.0b013e3181c30fb2

Colin G. Walsh, Kavya Sharman, George Hripcsak, Beyond discrimination: A comparison of calibration methods and clinical usefulness of predictive models of readmission risk, Journal of Biomedical Informatics, Volume 76, 2017, Pages 9-18, ISSN 1532-0464,

 Van Calster, B., McLernon, D.J., van Smeden, M. et al. Calibration: the Achilles heel of predictive analytics. BMC Med 17, 230 (2019).

Wednesday, September 2, 2020

Blocking and Causality

In a previous post I discussed block randomized designs. 

Duflo et al (2008) describe this in more detail:

"Since the covariates to be used must be chosen in advance in order to avoid specification searching and data mining, they can be used to stratify (or block) the sample in order to improve the precision of estimates. This technique (¯rst proposed by Fisher (1926)) involves dividing the sample into groups sharing the same or similar values of certain observable characteristics. The randomization ensures that treatment and control groups will be similar in expectation. But stratification is used to ensure that along important observable dimensions this is also true in practice in the sample....blocking is more efficient than controlling ex post for these variables, since it ensures an equal proportion of treated and untreated unit within each block and therefore minimizes variance."

They also elaborate on blocking when you are interested in subgroup analysis:

"Apart from reducing variance, an important reason to adopt a stratified design is when the researchers are interested in the effect of the program on specific subgroups. If one is interested in the effect of the program on a sub-group, the experiment must have enough power for this subgroup (each sub-group constitutes in some sense a distinct experiment). Stratification according to those subgroups then ensure that the ratio between treatment and control units is determined by the experimenter in each sub-group, and can therefore be chosen optimally. It is also an assurance for the reader that the sub-group analysis was planned in advance."

Dijkman et al (2009) discuss subgroup analysis in blocked or stratified designs in more detail:

"When stratification of randomization is based on subgroup variables, it is more likely that treatment assignments within subgroups are balanced, making each subgroup a small trial. Because randomization makes it likely for the subgroups to be similar in all aspects except treatment, valid inferences about treatment efficacy within subgroups are likely to be drawn. In post hoc subgroup analyses, the subgroups are often incomparable because no stratified randomization is performed. Additionally, stratified randomization is desirable since it forces researchers to define subgroups before the start of the study."

Both of these accounts seem very much consistent with each other in terms of thinking about randomization within subgroups creating a mini trial where causal inferences can be drawn. But I think the key thing to consider is they are referring to comparisons made WITHIN sub groups and not necessarily BETWEEN subgroups. 

Gerber and Green discuss this in one of their chapters on analysis of block randomized experiments :

"Regardless of whether one controls for blocks using weighted regression or regression with indicators for blocks, they key principle is to compare treatment and control subjects within blocks, not between blocks."

When we start to compare treatment and control units BETWEEN blocks or subgroups we are essentially interpreting covariates and this cannot be done with a causal interpretation. Green and Gerber discuss an example related to differences in the performance of Hindu vs. Muslim schools. 

"it could just be that religion is a marker for a host of unmeasured attributes that are correlated with educational outcomes. The set of covariates included in an experimental analysis need not be a complete list of factors that affect outcomes: the fact that some factors are left out or poorly measured is not a source of bias when the aim is to measure the average treatment effect of the random intervention. Omitted variables and mismeasurement, however, can lead to sever bias if the aim is to draw causal inferences about the effects of covariates. Causal interpretation of the covariates encounters all of the threats to inference associated with analysis of observational data."

In other words, these kinds of comparisons face the the same challenges related to interpreting control variables in a regression in an observational setting (see Keele, 2020). 

But why doesn't randomization within religion allow us to make causal statements about these comparisons? Let's think about a different example. Suppose we wanted to measure treatment effects for some kind of educational intervention and we were interested in subgroup differences in the outcome between public and private high schools. We could randomly assign treatments and controls within the public school population and do the same within the private school population. We know overall treatment effects would be unbiased because the school type would be perfectly balanced (instead of balanced just on average in a completely random design) and we would expect all other important confounders to be balanced between treatments and controls on average. 

We also know that within the group of private schools the treatment and controls should at least on average be balanced for certain confounders (median household income, teacher's education/training/experience, and perhaps an unobservable confounder related to student motivation). 

We could say the same thing about comparisons WITHIN the subgroup of public schools. But there is no reason to believe that the treated students in private schools would be comparable to the treated students in public schools because there is no reason to expect that important confounders would be balanced when making the comparisons. 

Assume we are looking at differences in first semester college GPA. Maybe within the private subgroup we find that treated treated students on average have a first semester college GPA that is .25 points higher the comparable control group. But within the public school subgroup, this differences was only .10. We can say that there is a difference in outcomes of .15 points between groups but can we say this is causal? Is the difference really related to school type or is school type really a proxy for income, teacher quality, or motivation? If we increased motivation or income in the public schools would that make up the difference? We might do better if our design originally stratified on all of these important confounders like income and teacher education. Then we could compare students in both public and private schools with similar family incomes and teachers of similar credentials. But...there is no reason to believe that student motivation would be balanced. We can't block or stratify on an unobservable confounder. Again, as Gerber and Green state, we find ourselves in a world that borders between experimental and non-experimental methods. Simply, the subgroups defined by any particular covariate that itself is not or cannot be randomly assigned may have different potential outcomes. What we can say from these results is that school type predicts the outcome but does not necessarily cause it.

Gerber and Green expound on this idea:

"Subgroup analysis should be thought of as exploratory or descriptive analysis....if the aim is simply to predict when treatment effects will be large, the researcher need not have a correctly specified causal model that explains treatment effects (see to explain or predict)....noticing that treatment effects tend to be large in some groups and absent from others can provide important clues about why treatments work. But resist the temptation to think subgroup differences establish the causal effect of randomly varying one's subgroup attributes."


Dijkman B, Kooistra B, Bhandari M; Evidence-Based Surgery Working Group. How to work with a subgroup analysis. Can J Surg. 2009;52(6):515-522. 

Duflo, Esther, Rachel Glennerster, and Michael Kremer. 2008. “Using Randomization in Development Economics Research: A Toolkit.” T. Schultz and John Strauss, eds., Handbook of Development Economics. Vol. 4. Amsterdam and New York: North Holland.

Gerber, Alan S., and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton

Keele, L., Stevenson, R., & Elwert, F. (2020). The causal interpretation of estimated associations in regression models. Political Science Research and Methods, 8(1), 1-13. doi:10.1017/psrm.2019.31