Showing posts with label CausalAI. Show all posts
Showing posts with label CausalAI. Show all posts

Saturday, July 29, 2023

If Applied Econometrics Were Easy, LLMs Could Do It!

Summary

Can AI do applied econometrics and causal inference? Can LLMs pick up on the nuances and social norms that dictate so many of the decisions made in applied work and reflect them in response to a prompt? LLMs bring to the table incredible capabilities and efficiencies and opportunities to create value. But there are risks when these tools are used like Dunning-Kruger-as-a-Service (DKaaS), where the critical thinking and actual learning begins and ends with prompt engineering and a response. We have to be very careful to recognize as Philip Tetlock describes in his book "Superforecasters" that there is a difference between mimicking and reflecting meaning vs. originating meaning.  To recognize that it’s not just what you know that matters, but how you know what you know. The second-handed tendency to believe that we can or should be outsourcing, nay, sacrificing our thinking to AI in exchange for misleading if not false promises about value, is philosophically and epistemically disturbing.

AI vs. Causal Thinking

This is a good article, from causal lens: Enterprise Decision Making Needs More Than Chatbots

"while LLMs are good at learning and extracting information from a corpus, they’re blind to something that humans do really well – which is to measure the impact of one’s decisions." 

In a recent talk Cassie Kozrykov puts it well: "AI does not automate thinking!"

   

Channelling Judea Pearl, understanding what makes a difference (causality)requires more than data, it also requires something not in the data to begin with. So much of the hype around AI is based on a tools and technology mindset. As Captain Jack Sparrow says about ships in Pirates of the Caribbean, a ship is more than sails and rudders, those are things a ship needs. What a ship really is, is freedom. Causal inference is more than methods and theorems, those are things causal inference needs, but what it really is, is a way of thinking. And in business, what is required is an alignment of thinking. For instance, in his article The Importance of Being Causal, Ivor Bojinov describes the Causal Data Analysis Review Committee at LinkedIn. It is a common best practice in learning organizations that leverage experimentation and causal inference. 

If you  attended very many of those reviews you begin to appreciate the amount of careful thinking required to understand the business problem, frame the hypothesis, and translate it to an analytical solution....then interpret the results and make a recommendation about what action to take next. Similarly a typical machine learning workflow requires up front thinking and problem framing. But unlike training an ML model, as Scott Lundberg describes (see my LI Post: Beyond SHAP Values and Crystal Balls), understanding what makes a difference is not just a matter of letting an algo figure out the best predictors  and calling it a day, there is an entire garden of forking paths to navigate and each turn requires more thinking and a vast difference in opinions among 'experts' about which direction to go.

As I discussed in a past post about forking paths in analysis

"even if all I am after is a single estimate of a given regression coefficient, multiple testing and researcher degrees of freedom may actually become quite a relevant concern...and this reveals the fragility in a lot of empirical work that prudence would require us to view with a critical eye"

Sure you could probably pair a LLM with statistical software and a data base connection and ask it to run a regression, but getting back to Jack Sparrow's ship analogy, a regression is more than just fitting a line to data and testing for heteroskedasticity and multicollinearity (lets hope if LLMs train on econometrics textbooks they don't weight the value of information by the amount of material dedicated to multicollinearity!!!) and the laundry list of textbook assumptions. AI could probably even describe in words a mechanical interpretation of the results. All of that is really cool, and something like that could save a lot of time and augment our workflows (which is valuable) but we also have to be careful about that tools mindset creeping back on us. All those things that AI may be able to do are only the things regression needs, but to get where we need to go, to understand why, we need way more than what AI can currently provide. We need thinking. So even for a basic regression, depending on our goals, the thinking required is currently and may always be beyond the capabilities of AI.

When we think about these forking paths encountered in applied work, each path can end with a different measure of impact that comes with a number of caveats and tradeoffs to think about. There are seldom standard problems with standard solutions. The course of action taken requires conscious decisions and the meeting of minds among different expert judgements (if not explicitly then implicitly) that considers all the tradeoffs involved in moving from what may be theoretically correct and what is practically feasible. 

In his book, "A Guide to Econometrics" Peter Kennedy states that "Applied econometricians are continually faced with awkward compromises" and offers a great story about what it's like to do applied work: 

"Econometric theory is like an exquisitely balanced French recipe, spelling out precisely with how many turns to mix the sauce, how many carats of spice to add, and for how many milliseconds to bake the mixture at exactly 474 degrees of temperature. But when the statistical cook turns to raw materials, he finds that hearts of cactus fruit are unavailable, so he substitutes chunks of cantaloupe; where the recipe calls for vermicelli he used shredded wheat; and he substitutes green garment die for curry, ping-pong balls for turtles eggs, and for Chalifougnac vintage 1883, a can of turpentine."

What choice would AI driven causal inference make when it has to make the awkward compromise between Chalifougnac vintage 1883 and turpentine and how would it explain the choice it made and the thinking that went into it? How would that choice stack up against the opinions of four other applied econometricians who would have chosen differently? 

As Richard McElreath discusses in his great book Statistical Rethinking:

"Statisticians do not in general exactly agree on how to analyze anything but the simplest of problems. The fact that statistical inference uses mathematics does not imply that there is only one reasonable or useful way to conduct an analysis. Engineers use math as well, but there are many ways to build a bridge." 

This is why in applied economics so much of what we may consider as 'best practices' are as much the result of social norms and practices as they are textbook theory. These norms are often established and evolve informally over time and sometimes adapted to the particulars of circumstances and place unique to a business or decision making environment, or research discipline (this explains the language barriers for instance between economists and epidemiologists and why different language can be used to describe the same thing and the same language can mean different things to different practitioners). A kind of result of human action but not human design, many best practices may seldom be formally codified or published in a way accessible to train a chatbot to read and understand. Would an algorithm be able to understand and relay back this nuance? I gave this a try by asking chatGPT about linear probability models (LPMs), and while I was impressed with some of the detail, I'm not fully convinced at this point based on the answers I got. While it did a great job articulating the pros and cons of LPMs vs logistic regression or other models, I think it would leave the casual reader with the impression that they should be wary of relying on LPMs to estimate treatment effects in most situations. So they miss out on the practical benefits (the 'pros' that come from using LPMs) while avoiding the 'cons' that as Angrist and Pischke might say, are mostly harmless. I would be concerned about more challenging econometric problems with more nuance and more appeal to social norms and practices and thinking that an LLM may not be privy to.

ChatGPT as a Research Assistant

Outside of actually doing applied econometrics and causal inference, I have additional concerns with LLMs and AI when it comes to using them as a tool for research and learning. At first it might seem really great if instead of reading five journal articles you could just have a tool like chatGPT do the hard work for you and summarize them in a fraction of the time! And I agree this kind of summary knowledge is useful, but probably not in the way many users might think. 

I have been thinking a lot about how much you get out of putting your hands on a paper or book and going through it and wrestling with the ideas, the paths leading from from hypotheses to the conclusions, and how the cited references let you retrace the steps of the authors to understand why, either slowly nudging your priors in new directions or reinforcing your existing perspective, and synthesizing these ideas with your own. Then summarizing and applying and communicating this synthesis with others. 

ChatGPT might give the impression that is what it is doing in a fraction of the time you could do it (literally seconds vs. hours or days). However, even if it gave the same summary you could write verbatim the difference couldn't be as far apart as night and day in terms of the value created. There is a big difference between the learning that takes place when you go through this process of integrative complex thinking vs. just reading a summary delivered on a silver platter from chatGPT. I’m skeptical what I’m describing can be outsourced to AI without losing something important. I also think there are real risks and costs involved when these tools are used like Dunning-Kruger-as-a-Service (DKaaS), where the critical thinking and actual learning begins and ends with prompt engineering and a response. 

When it comes to the practical application of this knowledge and thinking and solving new problems it’s not just what you know that matters, but how you know what you know. If all you have is a summary, will you know how to navigate the tradeoffs between what is theoretically correct and what is practically feasible to make the best decision in terms of what forking path to take in an analysis? Knowing about the importance of social norms and practices in doing applied work, and if the discussion above about LPMs is any indication, I'm not sure. And with just the summary, will you be able to quickly assimilate new developments in the field....or will you have to go back to chatGPT. How much knowledge and important nuance is lost with every update? What is missed? Thinking!

As Cassie says in her talk, thinking is about:

"knowing what is worth saying...knowing what is worth doing, we are thinking when we are coming up with ideas, when we are solving problems, when we are being creative"

AI is not capable of doing these things, and believing and even attempting or pretending that we can get these things on a second-handed basis from an AI tool will ultimately erode the real human skills and capabilities essential to real productivity and growth over the long run. If we fail to accept this we will hear a giant sucking sound that is the ROI we thought we were going to get from AI in the short run by attempting to automate what can't be automated. That is the false promise of a tools and technology mindset.

It worries me that this same tools and technology based data science alchemy mindset has moved many managers who were once were sold the snake oil that data scientists could simply spin data into gold with deep learning, will now buy into the snake oil that LLMs will be able to spin data into gold and do it even cheaper and send the thinkers packing! 

Similarly Cassie says: "that may be the biggest problem, that management has not learned how to manage thinking...vs. what you can measure easily....thinking is something you can't force, you can only get in the way of it."

She elaborates a bit more about this in her LinkedIn post: "A misguided view of productivity could mean lost jobs for workers without whom organizations won't be able to thrive in the long run - what a painful mistake for everyone."

Thunking vs. Thinking

I did say that this kind of summary info can be useful. And I agree that the kinds of things that AI and LLMs will be useful for are what Cassie refers to in her talk as 'thunking.'  The things that consume our time and resources but don't require thinking. Having done your homework, the kind of summary information you get from an LLM can help reinforce your thinking and learnings and save time in terms of manually googling or looking up a lot of things you once knew but have forgotten. If there is an area you haven't thought about in a while it can be a great way to help get back up to speed. And when trying to learn new things, it can be leveraged to speed up some aspects of your discovery process or make it more efficient, or even help challenge or vet your thinking (virtually bouncing ideas back and forth). But to be useful, this still requires some background knowledge and should never be a substitute for putting your hands on a paper and doing the required careful and critical thinking.

One area of applied econometrics I have not mentioned is the often less glamorous work it takes to implement a solution. In addition to all the thinking involved in translating the solution and navigating the forking paths, there is a lot of time spent accessing and transforming the data and implementing the estimation that involves coding (note even in the midst of all that thunking work there is still thinking involved - sometimes we learn the most about our business and our problem while attempting to wrangle the data - so this is also a place where we need to be careful about what we automate). Lots of data science folks are also using these tools to speed up some of their programming tasks. I'm a habitual user of stack-exchange and git hub and constantly recycle my own code or others' code. But I burn a lot of time somedays in search of what I need. That's the kind of thunking that it makes since to enlist new AI tools for!

Conclusion: Thinking is Our Responsibility

I've observed two extremes when it comes to opinions about tools like ChatGPT. One is that LLMs have the knowledge and wisdom of Yoda and will solve all of our problems. The other extreme is that because LLMs don't have the knowledge and wisdom of Yoda they are largely irrelevant. Obviously there is middle ground and I am trying to find it in this post. And I think Cassie has found it:

"AI does not automate thinking. It doesn't! There is a lot of strange rumblings about this that sound very odd to me who has been in this space for 2 decades"

I have sensed those same rumblings and it should make us all feel a bit uneasy. She goes on to say:

"when you are not the one making the decision and it looks like the machine is doing it, there is someone who is actually making that decision for you...and I think that we have been complacent and we have allowed our technology to be faceless....how will we hold them accountable....for wisdom...thinking is our responsibility"

Thinking is a moral responsibility. Outsourcing our thinking and fooling ourselves into thinking we can get knowledge and wisdom and judgment second-handed from a summary written by an AI tool, and to believe that is the same thing and provides the same value as what we could produce as thinking humans is a dangerous illusion when ultimately, thinking is the means by which the human race and civil society ultimately thrives and survives. In 2020 former President Barak Obama emphasized the importance of thinking in a democracy: 

"if we do not have the capacity to distinguish what's true from what's false, then by definition the marketplace of ideas doesn't work. And by definition our democracy doesn't work. We are entering into an epistemological crisis." 

The wrong kind of tools and technology mindset, and obsequiousness toward the technology, and a second-handed tendency to believe that we can or should be outsourcing, nay, sacrificing our thinking to AI in exchange for misleading if not false promises about value, is philosophically and epistemically disturbing.

LLMs bring to the table incredible capabilities and efficiencies and opportunities to create value. But we have to be very careful to recognize as Philip Tetlock describes in his book Superforecasters, that there is a difference between mimicking and reflecting meaning vs. originating meaning.  To recognize that it’s not just what you know that matters, but how you know what you know. To repurpose the closing statements from the book Mostly Harmless Econometrics: If applied econometrics were easy, LLMs could do it.

Additional Resources:

Thunking vs Thinking: Whose Job Does AI Automate? Which tasks are on AI’s chopping block? Cassie Kozrykov. https://kozyrkov.medium.com/thunking-vs-thinking-whose-job-does-ai-automate-959e3585877b

Statistics is a Way of Thinking Not a Just a Box of Tools. https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html 

Will There Be a Credibility Revolution in Data Science and AI? https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html 

Note on updates: An original version of this post was written on July 29 in conjunction with the post On LLMs and LPMs: Does the LL in LLM Stand for Linear Literalism? Shortly after posting I ran across Cassie's talk and updated to incorporate many of the points she made, with the best of intentions. Any  misrepresentation/misappropriation of her views is unintentional. 

Thursday, November 4, 2021

Causal Decision Making with non-Causal Models

In a previous post I noted: 

" ...correlations or 'flags' from big data might not 'identify' causal effects, but they are useful for prediction and might point us in directions where we can more rigorously investigate causal relationships"

Recently on LinkedIn I discussed situations where we have to be careful about taking action on specific features in a correlational model, for instance changing product attributes or designing an intervention based on interpretations of SHAP values from non-causal predictive models. I quoted Scott Lundberg:

"regularized machine learning models like XGBoost will tend to build the most parsimonious models that predict the best with the fewest features necessary (which is often something we strive for). This property often leads them to select features that are surrogates for multiple causal drivers which is "very useful for generating robust predictions...but not good for understanding which features we should manipulate to increase retention."

So sometimes, we may go into a project with the intention of only needing predictions. We might just want to target offers or nudges to customers or product users but not think about this in causal terms at first. But, as I have discussed before the conversation often inevitably turns to causality, even if stakeholders and business users don't use causal language to describe their problems. 

"Once armed with predictions, businesses will start to ask questions about 'why'... they will want to know what decisions or factors are moving the needle on revenue or customer satisfaction and engagement or improved efficiencies...There is a significant difference between understanding what drivers correlate with or 'predict' the outcome of interest and what is actually driving the outcome."

This would seem to call for causal models. However, in their recent paper Carlos Fernández-Loría and Foster Provost make an exciting claim:

“what might traditionally be considered “good” estimates of causal effects are not necessary to make good causal decisions…implications above are quite important in practice, because acquiring data to estimate causal effects accurately is often complicated and expensive. Empirically, we see that results can be considerably better when modeling intervention decisions rather than causal effects.”

Now in this case they are not talking about causal models related to identifying key drivers of an outcome, so it is not contradicting anything mentioned above or in previous posts. Particularly they are talking about building models for causal decision making (CDM) that are simply focused on making decisions about who to 'treat' or target.  In this particular scenario businesses are leveraging predictive models to target offers, provide incentives, or make recommendations. As discussed in the paper, there are two broad ways of approaching this problem. Let's say the problem is related to churn.

1) We could predict risk of churn and target members most likely to churn. We could do this with a purely correlational machine learning model. The output or estimand from this model is a predicted probability p() or risk score. They also refer to these kinds of models as 'outcome' models

2) We could build a causal model, that predicts causal impact of an outreach. This would allow us to target customers that we can most likely 'save' as a result of our intervention. They refer to this estimand as a causal effect estimate CEE. Building machine learning models that are causal can be more challenging and resource intensive.

It is true at the end of the day we want to maximize our impact. But the causal decision is ultimately who do we target in order to maximize impact. They point out this causal decision does not necessarily hinge on how accurate our point estimate is related to causal impact as long as errors in prediction still lead to the same decisions about who to target.

What they find is that in order to make good causal decisions about who to 'treat' we don't have to have super accurate estimates of the causal impact of treatment (or models focused on CEE). In fact they talk through scenarios and conditions where outcome models like #1 above that are non-causal, can perform just as well or sometimes better than more accurate causal models focusing on CEE. 

In other words, correlational outcome models (like #1) can essentially serve as proxies for the more complicated causal models (like #2), even if the data used to estimate these 'proxy' models is confounded.

 Scenarios where this is most likely include:

1) Outcomes used as proxies and (causal)effects are correlated

2) Outcomes used as proxies are easier to estimate than causal effects

3) Predictions are used to rank individuals

They also give some reasons why this may be true. Biased non-causal models built on confounded data may not be able to identify true causal effects, but still be useful for identifying the optimal decision. 

"This could occur when confounding is stronger for individuals with large effects - for example if confounding bias is stronger for 'likely' buyers, but the effect of adds is also stronger for them...the key insight here is that optimizing to make the correct decision generally involves understanding whether a causal effect is above or below a given threshold, which is different from optimizing to reduce the magnitude of bias in a causal effect estimate."

"Models trained with confounded data may lead to decisions that are as good (or better) than the decisions made with models trained with costly experimental data, in particular when larger causal effects are more likely to be overestimated or when variance reduction benefits of more and cheaper data outweigh the detrimental effect of confounding....issues that make it impossible to estimate causal effects accurately do not necessarily keep us from using the data to make accurate intervention decisions."

Their arguments hinge on the idea that what we are really solving for in these decisions is based on ranking:

"Assuming...the selection mechanism producing the confounding is a function of the causal effect - so that the larger the causal effect the stronger the selection-then (intuitively) the ranking of the preferred treatment alternatives should be preserved in the confounded setting, allowing for optimal treatment assignment policies from data."

A lot of this really comes down to proper problem framing and appealing to the popular paraphrasing of George E. P. Box - all models are wrong, but some are useful. It turns out in this particular use case non-causal models can be as useful or more useful than causal ones.

And we do need to be careful about the nuance of the problem framing. As the authors point out, this solves one particular business problem and use case, but does not answer some of the most important causal questions businesses may be interested in:

"This does not imply that firms should stop investing in randomized experiments or that causal effect estimation is not relevant for decision making. The argument here is that causal effect estimation is not necessary for doing effective treatment assignment."

They go on to argue that randomized tests and other causal methods are still core to understanding the effectiveness of interventions and strategies for improving effectiveness. Their use case begins and ends with what is just one step in the entire lifecycle of product development, deployment, and optimization. In their discussion of further work they suggest that:

"Decision makers could focus on running randomized experiments in parts of the feature space where confounding is particularly hurtful for decision making, resulting in higher returns on their experimentation budget."

This essentially parallels my previous discussion related to SHAP values. For a great reference for making practical business decisions about when this is worth the effort see the HBR article in the references discussing when to act on a correlation.

So some big takeaways are:

1) When building a model for purposes of causal decision making (CDM) even a biased model (non-causal) can perform as well or better than a causal model focused on CEE.

2) In many cases, even a predictive model that provides predicted probabilities or risk (as proxies for causal impact or CEE) can perform as well or better than causal models when the goal is CDM.

3) If the goal is to take action based on important features (i.e. SHAP values as discussed before) however, we still need to apply a causal framework and understanding the actual effectiveness of interventions may still require randomized tests or other methods of causal inference.


References: 

Causal Decision Making and Causal Effect Estimation Are Not the Same... and Why It Matters. Carlos Fernández-Loría and Foster Provost. 2021. https://arxiv.org/abs/2104.04103

When to Act on a Correlation, and When Not To. David Ritter. Harvard Business Review. March 19, 2014. 

Be Careful When Interpreting Predictive Models in Search of Causal Insights. Scott Lundberg. https://towardsdatascience.com/be-careful-when-interpreting-predictive-models-in-search-of-causal-insights-e68626e664b6  

Additional Reading:

Laura B Balzer, Maya L Petersen, Invited Commentary: Machine Learning in Causal Inference—How Do I Love Thee? Let Me Count the Ways, American Journal of Epidemiology, Volume 190, Issue 8, August 2021, Pages 1483–1487, https://doi.org/10.1093/aje/kwab048

Petersen, M. L., & van der Laan, M. J. (2014). Causal models and learning from data: integrating causal modeling and statistical estimation. Epidemiology (Cambridge, Mass.), 25(3), 418–426. https://doi.org/10.1097/EDE.0000000000000078

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning. Numair Sani, Daniel Malinsky, Ilya Shpitser arXiv:2006.02482v3  

Related Posts:

Will there be a credibility revolution in data science and AI? 

Statistics is a Way of Thinking, Not a Toolbox

Big Data: Don't Throw the Baby Out with the Bathwater

Big Data: Causality and Local Expertise Are Key in Agronomic Applications 

The Use of Knowledge in a Big Data Society

Wednesday, May 6, 2020

Experimentation and Causal Inference: Strategy and Innovation

Knowledge is the most important resource in a firm and the essence of organizational capability, innovation, value creation, strategy, and competitive advantage. Causal knowledge is no exception.In previous posts I have discussed the value proposition of experimentation and causal inference from both mainline and behavioral economic perspectives. This series of posts has been greatly influenced by Jim Manzi's book 'Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society.' Midway through the book Manzi highlights three important things that experimentation and causal inference in business settings can do:

1) Precision around the tactical implementation of strategy
2) Feedback on the performance of a strategy and refinements driven by evidence
3) Achievement of organizational and strategic alignment

Manzi explains that within any corporation there are always silos and subcultures advocating competing strategies with perverse incentives and agendas in pursuit of power and control. How do we know who is right and which programs or ideas are successful, considering the many factors that could be influencing any outcome of interest?  Manzi describes any environment where the number of causes of variation are enormous as an environment that has 'high causal density.' We can claim to address this with a data driven culture, but what does that mean? How do we know what is, and isn't supported by data? Modern companies in a digital age with AI and big data are drowning in data. This makes it easy to adorn rhetoric in advanced analytical frameworks. Because data seldom speaks, anyone can speak for the data through wily data story telling.  Decision makers fail to make the distinction between just having data, and having evidence to support good decisions.

As Jim Manzi and Stefan Thomke discuss in Harvard Business Review:

"business experiments can allow companies to look beyond correlation and investigate causality....Without it, executives have only a fragmentary understanding of their businesses, and the decisions they make can easily backfire."

Without experimentation and causal inference, there is know way to connect the things we do with the value created. In complex environments with high causal density, we don't know enough about the nature and causes of human behavior, decisions, and causal paths from actions to outcomes to list them all and measure and account for them even if we could agree how to measure them. This is the nature of decision making under uncertainty. But, as R.A. Fisher taught us with his agricultural experiments, randomized tests allow us to account for all of these hidden factors (Manzi calls them hidden conditionals). Only then does our data stand a chance to speak truth. Experimentation and causal inference don't provide perfect information but they are the only means by which we can begin to say that we have data and evidence to inform the tactical implementation of our strategy as opposed to pretending that we do based on correlations alone. As economist F.A. Hayek once said:

"I prefer true but imperfect knowledge, even if it leaves much undetermined and unpredictable, to a pretense of exact knowledge that is likely to be false"

In Dual Transformation: How to Reposition Today's Business While Creating the Future authors discuss the importance of experimentation and causal inference as a way to navigate uncertainty in causally dense environments in what they refer to as transformation B:

“Whenever you innovate, you can never be sure about the assumptions on which your business rests. So, like a good scientist, you start with a hypothesis, then design and experiment. Make sure the experiment has clear objectives (why are you running it and what do you hope to learn). Even if you have no idea what the right answer is, make a prediction. Finally, execute in such a way that you can measure the prediction, such as running a so-called A/B test in which you vary a single factor."

Experiments aren't just tinkering and trying new things. While these are helpful to innovation, just tinkering and observing still leaves you speculating about what really works and is subject to all the same behavioral biases and pitfalls of big data previously discussed.

List and Gneezy address this in The Why Axis:

"Many businesses experiment and often...businesses always tinker...and try new things...the problem is that businesses rarely conduct experiments that allow a comparison between a treatment and control group...Business experiments are research investigations that give companies the opportunity to get fast and accurate data regarding important decisions."

Three things distinguish experimentation and causal inference from just tinkering:

1) Separation of signal from noise (statistical inference)
2) Connecting cause and effect  (causal inference)
3) Clear signals on business value that follows from 1 & 2 above

Having causal knowledge helps identify more informed and calculated risks vs. risks taken on the basis of gut instinct, political motivation, or overly optimistic and behaviorally biased data-driven correlational pattern finding analytics. 

Experimentation and causal inference add incremental knowledge and value to business. No single experiment is going to be a 'killer app' that by itself will generate millions in profits. But in aggregate the knowledge created by experimentation and causal inference probably offers the greatest strategic value across an enterprise compared to any other analytic method.

As discussed earlier, experimentation and causal inference creates value by helping manage the knowledge problem within firms, it's worth repeating again from List and Gneezy:

"We think that businesses that don't experiment and fail to show, through hard data, that their ideas can actually work before the company takes action - are wasting their money....every day they set suboptimal prices, place adds that do not work, or use ineffective incentive schemes for their work force, they effectively leave millions of dollars on the table."

As Luke Froeb writes in Managerial Economics, A Problem Solving Approach (3rd Edition):

"With the benefit of hindsight, it is easy to identify successful strategies (and the reasons for their success) or failed strategies (and the reason for their failures). It's much more difficult to identify successful or failed strategies before they succeed or fail."

Again from Dual Transformation:

"Explorers recognize they can't know the right answer, so they want to invest as little as possible in learning which of their hypotheses are right and which ones are wrong"

Experimentation and causal inference offer the opportunity to test strategies early on a smaller scale to get causal feedback about potential success or failure before fully committing large amounts of irrecoverable resources. They allow us to fail smarter and learn faster. Experimentation and causal inference play a central role in product development, strategy, and innovation across a range of industries and companies like Harrah's casinos, Capital One, Petco, Publix, State Farm, Kohl's, Wal-Mart, and Humana who have been leading in this area for decades in addition to new ventures like Amazon and Uber. 

"At Uber Labs, we apply behavioral science insights and methodologies to help product teams improve the Uber customer experience. One of the most exciting areas we’ve been working on is causal inference, a category of statistical methods that is commonly used in behavioral science research to understand the causes behind the results we see from experiments or observations...Teams across Uber apply causal inference methods that enable us to bring richer insights to operations analysis, product development, and other areas critical to improving the user experience on our platform." - From: Using Causal Inference to Improve the Uber User Experience (link)

Economist Joshua Angrist explains about his students that have went on to work for companies like Amazon: "when I ask them what are they up to they say...we're running experiments."

Achieving the greatest value from experimentation and causal inference requires leadership commitment.  It also demands a culture that is genuinely open to learning through a blend of trial and error, data driven decision making informed by theory and experiments, and the infrastructure necessary for implementing enough tests and iterations to generate the knowledge necessary for rapid learning and innovation. It requires business leaders, strategists, and product managers to think about what they are trying to achieve and asking causal questions to get there (vs. data scientists sitting in an ivory tower dreaming up models or experiments of their own). The result is a corporate culture that allows an organization to formulate, implement, and modify strategy faster and more tactfully than others.

See also:
Experimentation and Causal Inference: The Knowledge Problem
Experimentation and Causal Inference: A Behavioral Economics Perspective
Statistics is a Way of Thinking, Not a Box of Tools

Monday, December 16, 2019

Some Recommended Podcasts and Episodes on AI and Machine Learning

Something I have been interested in for some time now is both is the convergence of big data and genomics and the convergence of causal inference and machine learning. 

I am a big fan of the Talking Biotech Podcast which allows me to keep up with some of the latest issues and research in biotechnology and medicine. A recent episode related to AI and machine learning covered a lot of topics that resonated with me. 

There was excellent discussion on the human element involved in this work, and the importance of data data prep/feature engineering (the 80% of work that has to happen before the ML/AI can do its job) and the challenges of non-standard 'omics' data.  Also the potential biases that researchers and developers can inadvertently introduce in this process. Much more including applications of machine learning and AI in this space and best ways to stay up to speed on fast changing technologies without having to be a heads down programmer. 

I've been in a data science role since 2008 and have transitioned from SAS to R to python. I've been able to keep up within the domain of causal inference to the extent possible, but I keep up with broader trends I am interested in via podcasts like Talking Biotech. Below is a curated list of my favorites related to data science with a few of my favorite episodes highlighted.


1) Casual Inference - This is my new favorite podcast by two biostatisticians covering epidemiology/biostatistics/causal inference - and keeping it casual.

Fairness in Machine Learning with Sherri Rose | Episode 03 - http://casualinfer.libsyn.com/fairness-in-machine-learning-with-sherri-rose-episode-03

This episode was the inspiration for my post: When Wicked Problems Meet Biased Data.





#093 Evolutionary Programming - 


#266 - Can we trust scientific discoveries made using machine learning



How social science research can inform the design of AI systems https://www.oreilly.com/radar/podcast/how-social-science-research-can-inform-the-design-of-ai-systems/ 



#37 Causality and potential outcomes with Irineo Cabreros - https://bioinformatics.chat/potential-outcomes  


Andrew Gelman - Social Science, Small Samples, and the Garden of Forking Paths https://www.econtalk.org/andrew-gelman-on-social-science-small-samples-and-the-garden-of-the-forking-paths/ 
James Heckman - Facts, Evidence, and the State of Econometrics https://www.econtalk.org/james-heckman-on-facts-evidence-and-the-state-of-econometrics/


Wednesday, December 11, 2019

When Wicked Problems Meet Biased Data

In "Dissecting racial bias in an algorithm used to manage the health of populations" (Science, Vol 366 25 Oct. 2019) the authors discuss inherent racial bias in widely adopted algorithms in healthcare. In a nutshell these algorithms use predicted cost as a proxy for health status. Unfortunately, in healthcare, costs can proxy for other things as well:

"Black patients generate lesser medical expenses, conditional on health, even when we account for specific comorbidities. As a result, accurate prediction of costs necessarily means being racially biased on health."

So what happened? How can it be mitigated? What can be done going forward?

 In data science, there are some popular frameworks for solving problems. One widely known approach is the CRISP-DM framework. Alternatively, in The Analytics Lifecycle Toolkit a similar process is proposed:

(1) - Problem Framing
(2) - Data Sense Making
(3) - Analytics Product Development
(4) - Results Activation

The wrong turn in Albuquerque here may have been at the corner of problem framing and data understanding or data sense making.

The authors state:

"Identifying patients who will derive the greatest benefit from these programs is a challenging causal inference problem that requires estimation of individual treatment effects. To solve this problem health systems make a key assumption: Those with the greatest care needs will benefit the most from the program. Under this assumption, the targeting problem becomes a pure prediction public policy problem."

The distinctions between 'predicting' and 'explaining' have been made in the literature by multiple authors in the last two decades. The problem with this substitution has important implications. To quote Galit Shmueli:

"My thesis is that statistical modeling, from the early stages of study design and data collection to data usage and reporting, takes a different path and leads to different results, depending on whether the goal is predictive or explanatory."

Almost a decade before, Leo Brieman encouraged us to think outside the box when solving problems by considering multiple approaches:

"Approaching problems by looking for a data model imposes an a priori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems. The best available solution to a data problem might be a data model; then again it might be an algorithmic model. The data and the problem guide the solution. To solve a wider range of data problems, a larger set of tools is needed."

A number of data analysts today may not be cognizant of the differences in predictive vs explanatory modeling and statistical inference. It may not be clear to them how that impacts their work. This could be related to background, training, or the kinds of problems they have worked on given their experience.  It is also important that we don't compartmentalize so much that we miss opportunities to approach our problem from a number of different angles (Leo Breiman's 'straight jacket') This is perhaps what happened in the Science article, once the problem was framed as a predictive modeling problem other modes of thinking may have shut down even if developers were aware of all of these distinctions.

The take away is that we think differently when doing statistical inference/explaining vs. predicting or doing machine learning. Making the substitution of one for the other impacts the way we approach the problem (things we care about, things we consider vs. discount etc.) and this impacts the data preparation, modeling, and interpretation.

For instance, in the Science article, after framing the problem as a predictive modeling problem, a pivotal focus became the 'labels' or target for prediction.

"The dilemma of which label to choose relates to a growing literature on 'problem formulation' in data science: the task of turning an often amorphous concept we wish to predict into a concrete variable that can be predicted in a given dataset."

As noted in the paper 'labels are often measured with errors that reflect structural inequalities.'

Addressing the issue with label choice can come with a number of challenges briefly alluded to in the article:

1) deep understanding of the domain - i.e subject matter expertise
2) identification and extraction of relevant data - i.e. data engineering and data governance
3) capacity to iterate and experiment - i.e. understanding causality, testing and measurement strategy

Data science problems in healthcare are wicked problems defined by interacting complexities with social, economic, and biological dimensions that transcend simply fitting a model to data. Expertise in a number of disciplines is required.

Bias in Risk Adjustment

In the Science article, the specific example was in relation to predictive models targeting patients for disease management programs. However, there are a number of other predictive modeling applications where these same issues can be prevalent in the healthcare space.

In Fair Regression for Health Care Spending, Sherri Rose and Anna Zink discuss these challenges in relation to popular regression based risk adjustment applications. Aligning with the analytics lifecycle discussed above, they point out there are several places where issues of bias can be addressed including pre-processing, model fitting, and post processing stages of analysis. In this article they focus largely on the modeling stage leveraging a number of constrained and penalized regression algorithms designed to optimize fairness. This work looks really promising, but the authors point out a number of challenges related to scalability and optimizing fairness across a number of metrics or groups.

Toward Causal AI and ML

Previously I referenced Galit Shmueli's work that discussed how differently we approach and think about predictive vs explanatory modeling. In the Book of Why, Judea Pearl discusses causal inferential thinking:

"Causal Analysis is emphatically not just about data; in causal analysis we must incorporate some understanding of the process that produces the data and then we get something that was not in the data to begin with." 

There is currently a lot of work fusing machine learning and causal inference that could create more robust learning algorithms. For example, Susan Athey's work with causal forests, Leon Bottou's work related to causal invariance, and Elias Barenboim's work on the data fusion problem.  This work, including the kind of work mentioned before related to fair regression will help inform the next generation of predictive modeling, machine learning, and causal inference models in the healthcare space that hopefully will represent a marked improvement over what is possible today.

However, we can't wait half a decade or more while the theory is developed and adopted by practitioners. In the Science article, the authors found alternative metrics for targeting disease management programs besides total costs that calibrate much more fairly across groups. Bridging the gap in other areas will require a combination of awareness of these issues and creativity throughout the analytics product lifecycle. As the authors conclude:

"careful choice can allow us to enjoy the benefits of algorithmic predictions while minimizing the risks."

References and Additional Reading:

This paper was recently discussed on the Casual Inference podcast.

Annual Review of Public Health 2020 41:1

Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1, 206–215 (2019) doi:10.1038/s42256-019-0048-x

Breiman, Leo. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist. Sci. 16 (2001), no. 3, 199--231. doi:10.1214/ss/1009213726. https://projecteuclid.org/euclid.ss/1009213726

Shmueli, G., "To Explain or To Predict?", Statistical Science, vol. 25, issue 3, pp. 289-310, 2010.

Fair Regression for Health Care Spending. Anna Zink, Sherri Rose. arXiv:1901.10566v2 [stat.AP]



Monday, September 30, 2019

Wicked Problems and The Role of Expertise and AI in Data Science

In 2018, an article in Science characterized the challenge of pesticide resistance as a wicked problem:

“If we are to address this recalcitrant issue of pesticide resistance, we must treat it as a “wicked problem,” in the sense that there are social, economic, and biological uncertainties and complexities interacting in ways that decrease incentives for actions aimed at mitigation.”

In graduate school, I worked on this same problem, attempting to model the social and economic systems with game theory and behavioral economics and capturing biological complexities leveraging population genetics. 

Wicked vs. Kind Environments

In data science, we also have 'wicked' learning environments in which we try to train our models. In the EconTalk podcast with Russ Roberts, Mastery, Specialization, and Range, David Epstein discusses wicked and kind learning environments:

"The way that chess works makes it what's called a kind learning environment. So, these are terms used by psychologist Robin Hogarth. And what a kind learning environment is, is one where patterns recur; ideally a situation is constrained--so, a chessboard with very rigid rules and a literal board is very constrained; and, importantly, every time you do something you get feedback that is totally obvious...you see the consequences. The consequences are completely immediate and accurate. And you adjust accordingly. And in these kinds of kind learning environments, if you are cognitively engaged you get better just by doing the activity."

"On the opposite end of the spectrum are wicked learning environments. And this is a spectrum, from kind to wicked. Wicked learning environments: often some information is hidden. Even when it isn't, feedback may be delayed. It may be infrequent. It may be nonexistent. And it maybe be partly accurate, or inaccurate in many of the cases. So, the most wicked learning environments will reinforce the wrong types of behavior."

As discussed in the podcast, many problems fall within some spectrum ranging between very kind environments like Chess to more complex environments like self driving cars or medical diagnosis. What do experts have to offer where AI/ML falls short? The type of environment determines to a great extent the scope of disruption we might be able to expect from AI applications.

The Role of Human Expertise

In Thinking Fast and Slow, Kahneman discusses two conditions for acquiring skill:

1) an environment that is sufficiently regular to be predictable
2) an opportunity to learn these regularities through prolonged practice

This sounds a lot like the 'kind' environments discussed above. Based on research by Robin Hogarth, Kahneman also makes these distinctions describing 'wicked' environments as those environments in which those with expertise are likely to learn the wrong lessons from experience. The problem is that with wicked environments, experts often default to heuristics which can lead to wrong conclusions. Even if aware of these biases, social norms often nudge experts into the wrong direction. Kahneman gives an example involving physicians:

"Generally it is considered a weakness and a sign of vulnerability for clinicians to appear unsure. Confidence is valued over uncertainty and there is a prevailing censure against disclosing uncertainty to patients...acting on pretended knowledge is often the preferred solution."

This likely explains many of the mistakes and low value care that are problematic with healthcare delivery as well as dissatisfaction with both the quality and costs of healthcare. How many of us want our physicians to pretend to know what they are talking about? On the other hand, how many people are willing to accept an answer from their physician that rhymes with "let me look this up and get back to you later." 

One advantage AI may have over experts in kind environments is as Kahneman puts it, the opportunity to learn through prolonged practice. Machine learning can handle many more training examples than a human so to speak.

Even in kind environments, an expert may swing and miss when dealing with cases where the correct decision is like a pitch straight over the plate. One reason Kahneman discusses in Thinking Fast and Slow is the idea of 'ego' depletion. This is related to the idea that mental energy can become exhausted after significant exertion. As self-control breaks down, its easy to default to heuristics and biases that can lead to decisions that look like careless mistakes. This would certainly apply to physicians given the number of stories we hear about burnout in the profession. 

The solution seems to be what polymath economist Tyler Cowen suggested several years ago in the econtalk podcast discussion he had about his book Average is Over with Russ Roberts:

"I would stress much more that humans can always complement robots. I'm not saying every human will be good at this. That's a big part of the problem. But a large number of humans will work very effectively with robots and become far more productive, and this will be one of the driving forces behind that inequality."

Imagine the clinical situation where a physician's 'ego' is substantially depleted from a difficult case. They could then lean on AI to prevent mistakes treating more routine decisions that follow. Or perhaps leveraging AI tools, a clinician could conserve additional mental energy throughout the day so that they are less likely to default to heuristics when they encounter more complex issues. The way this synergy materializes is uncertain, but it will certainly continue to involve substantial expertise on the part of many professionals going forward. Together human expertise and AI might have the greatest chance tackling the most wicked problems.

References:

Wicked evolution: Can we address the sociobiological dilemma of pesticide resistance? | Science  https://science.sciencemag.org/content/360/6390/728.full

Thinking Fast and Slow. Daniel Kahneman. 2011

EconTalk:David Epstein on Mastery, Specialization, and Range
https://www.econtalk.org/david-epstein-on-mastery-specialization-and-range/

EconTalk: Tyler Cowen on Inequality, the Future, and Average is Over
https://www.econtalk.org/tyler-cowen-on-inequality-the-future-and-average-is-over/

Tuesday, April 17, 2018

He who must not be named....or can we say 'causal'?

Recall in the Harry Potter series, the wizard community refused to say the name of 'Voldemort' and it got to the point where they almost stopped teaching and practicing magic (at least officially as mandated by the Ministry of Magic). In the research community, by refusing to use the term 'causal' when and where appropriate, are we discouraging researchers from asking interesting questions and putting forth the effort required to implement the kind of rigorous causal inferential methods necessary to push forward the frontiers of science? Could we somehow be putting a damper on teaching and practicing economagic...I mean econometrics...you know the mostly harmless kind? Will the credibility revolution be lost?

In a recent May 2018 article in the American Journal of Public Health (by Miguel Hernan of the Departments of Epidemiology and Biostatistics, Harvard School of Public Health) there is an important discussion about the somewhat tiring mantra 'correlation is not causation' and disservice to scientific advancement that it can lead to in absence of critical thinking about research objectives and designs. Some people might think this is ironic, since often the phrase is invoked as a means to point out fallacious conclusions that have been uncritically based on mere correlations found in the data. However, the pendulum can swing too far in the other direction causing as much harm.

I highly recommend reading this article! It is available ungated and will be one of those you hold onto for a while. See the reference section below.

Key to the discussion are important distinctions between questions of association, prediction, and causality. Below are some spoilers:

While it is wrong to assume causality based on association or correlation alone, refusing to recognize a causal approach in the analysis because of growing cultural 'norms' is also not good either....and should stop:

"The resulting ambiguity impedes a frank discussion about methodology because the methods used to estimate causal effects are not the same as those used to estimate associations...We need to stop treating “causal” as a dirty word that respectable investigators do not say in public or put in print. It is true that observational studies cannot definitely prove causation, but this statement misses the point"

All the glitters isn't gold, as the author notes on randomized controlled trials :

"Interestingly, the same is true of randomized trials. All we can estimate from randomized trials data are associations; we just feel more confident giving a causal interpretation to the association between treatment assignment and outcome because of the expected lack of confounding that physical randomization entails. However, the association measures from randomized trials cannot be given a free pass. Although randomization eliminates systematic confounding, even a perfect randomized trial only provides probabilistic bounds on “random confounding”—as reflected in the confidence interval of the association measure—and many randomized trials are far from perfect."

There are important distinctions between analysis and methodological approaches when asking questions related to prediction and association vs causality. Saying a bit more, this is not just about model interpretation. We are familiar with discussions about challenges related to interpreting predictive models derived from complicated black box algorithms, but causality hinges on much more than just the ability to interpret the impact of features on an outcome. Also note that while we are seeing applications of AI and automated feature engineering and algorithm selection, models optimized to predict well may not explain well at all. In fact, a causal model may perform worse in out of sample predictions of the 'target' while giving the most rigorous estimate of causal effects:

"In associational or predictive models, we do not try to endow the parameter estimates with a causal interpretation because we are not trying to adjust for confounding of the effect of every variable in the model. Confounding is a causal concept that does not apply to associations...By contrast, in a causal analysis, we need to think carefully about what variables can be confounders so that the parameter estimates for treatment or exposure can be causally interpreted. Automatic variable selection procedures may work for prediction, but not necessarily for causal inference. Selection algorithms that do not incorporate sufficient subject matter knowledge may select variables that introduce bias in the effect estimate, and ignoring the causal structure of the problem may lead to apparent paradoxes."

It all comes down to being a question of identification....or why AI has a long way to go in the causal space...or as Angrist and Pischke would put it....if applied econometrics were easy theorists would do it:

"Associational inference (prediction)or causal inference (counterfactual prediction)? The answer to this question has deep implications for (1) how we design the observational analysis to emulate a particular target trial and (2) how we choose confounding adjustment variables. Each causal question corresponds to a different target trial, may require adjustment for a different set of confounders, and is amenable to different types of sensitivity analyses. It then makes sense to publish separate articles for various causal questions based on the same data."

I really liked how they phrased 'prediction' in terms of distinctly being associational or prospective vs. counterfactual. Also, what a nice way to think about 'identification' being about how we emulate a particular trial and handle confounding/selection bias/endogneity.

Reference:

Miguel A. Hernán, “The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data”, American Journal of Public Health 108, no. 5 (May 1, 2018): pp. 616-619.

See also:

Will there be a credibility revolution in data science and AI?

To Explain or Predict?

Saturday, March 5, 2016

Machine Learning and Econometrics

Not long ago Tyler Cowen blogged at Marginal Revolution about a Quora post by Susan Athey discussing the impact of machine learning on econometrics, flavors of machine learning, and differences in the emphasis placed on tools and methodologies traditional in each field. The differences often hinge on whether one's intention is to explain or predict,  or if one is interested in causal inference vs analytics. I really liked the point about instrumental variables made in the snippet below:

"Yet, a cornerstone of introductory econometrics is that prediction is not causal inference, and indeed a classic economic example is that in many economic datasets, price and quantity are positively correlated.  Firms set prices higher in high-income cities where consumers buy more; they raise prices in anticipation of times of peak demand. A large body of econometric research seeks to REDUCE the goodness of fit of a model in order to estimate the causal effect of, say, changing prices. If prices and quantities are positively correlated in the data, any model that estimates the true causal effect (quantity goes down if you change price) will not do as good a job fitting the data….Techniques like instrumental variables seek to use only some of the information that is in the data – the “clean” or “exogenous” or “experiment-like” variation in price—sacrificing predictive accuracy in the current environment to learn about a more fundamental relationship that will help make decisions about changing price. This type of model has not received almost any attention in ML."

Tyler also points to a wealth of resources by Suan Athey here. And check out the mini-course she taught with Guido Imbens via NBER.

The differences and synergies between tools used in both econometrics and machine learning is something I have been interested in for a long time and have blogged about several times in the past. Kenneth Sanford and Hal Varian have also been writing about this as well. See related content below.

Related Content and Further Reading

Economists as Data Scientists http://econometricsense.blogspot.com/2012/10/economists-as-data-scientists.html

Econometrics, Math, and Machine Learning….what? http://econometricsense.blogspot.com/2015/09/econometrics-math-and-machine.html 

"Mathematical Themes in Economics, Machine Learning, and Bioinformatics" (2010)
Available at: http://works.bepress.com/matt_bogard/7/ 

Notes to 'Support' an Understanding of Support Vector Machines  http://econometricsense.blogspot.com/2012/05/notes-to-support-understanding-of.html

Culture War: Classical Statistics vs. Machine Learning http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html

Analytics vs Causal Inference http://econometricsense.blogspot.com/2014/01/analytics-vs-causal-inference.html

Big Data: Don’t throw the baby out with the bath water http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html

To Explain or Predict http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html 

Big Data: Causality and Local Expertise Are Key in Agronomic Applications http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html


Big Data:  New Tricks for Econometrics
Hal R. Varian
June 2013
Revised:  April 14, 2014
http://people.ischool.berkeley.edu/~hal/Papers/2013/ml.pdf

Is machine learning trending with economists? (Kenneth Sanford)  http://blogs.sas.com/content/subconsciousmusings/2015/06/05/is-machine-learning-trending-with-economists/