Sunday, May 31, 2026

Who is John Galt? Second-Handedness, AI and the Production and Use of Knowledge in Society

Almost 3 years ago I wrote a post entitled "If Applied Econometrics Were Easy LLMs Could Do it." Since then the technical capabilities of AI have progressed, but the advancements only reinforce some of the main takeaways from that post:

"There are risks when these tools are used like Dunning-Kruger-as-a-Service (DKaaS), where the critical thinking and actual learning begins and ends with prompt engineering and a response. We have to be very careful to recognize as Philip Tetlock describes in his book "Superforecasters" that there is a difference between mimicking and reflecting meaning vs. originating meaning.  To recognize that it’s not just what you know that matters, but how you know what you know. The second-handed tendency to believe that we can or should be outsourcing, nay, sacrificing our thinking to AI in exchange for misleading if not false promises about value, is philosophically and epistemically disturbing."

The Main Takeaway or BLUF: 

The overall implication of this post is that how we use AI can impact what we learn and how we learn. At a certain point the how starts to matter more than the what, undermining our long term growth and capabilities as individuals, profitability of businesses, and eventually in society overall.  

In this post I want to expand on this epistemically disturbing theme from my prior post given how rapidly AI capabilities are advancing. This is a long post - some may want to skip to the summary and conclusions at the end of the post and then come back to sections of interest. Or use AI to summarize the main points :) 

Disclaimer: AI was not used in any direct way to write this post. Any related Google searches were appended with '-ai' to avoid inadvertent influence of default AI summaries generated by a search.

Background:

My prior post also gets into lots of other things like AI and causality and working with AI mostly in the context of on doing applied econometrics. If you want to get a flavor of just how much AI may be influencing the way econometrics gets done, check out some of Scott Cunningham's work or Claude Blattman by Chris Blattman.  

Tyler Cowen at Marginal Revolution has had several posts discussing how AI is impacting economic research like this post - Will AI Kill the Research Paper?  In her post AI, Price Theory, and the Future of Economics Research, Lynn Kiesling offers a perspective focusing on the impact of AI on workflows and what skills will become differentiators for economists of the future, with a Hayekian take of course.  Brian Albrecht chimes in on this too. Both Kiesling and Albrecht discuss how AI can change workflows and reduce the costs of execution, but this will actually make economic reasoning more important. 

Albrecht states: "The question I would focus on is...whether the world still needs people who can hear a claim about the economy and ask whether it makes sense." 

His post makes the answer an obvious yes: "Those are not questions that more data answers. They require economic reasoning about what’s generating the patterns in the first place...automating the technique doesn’t automate the reasoning about whether the technique’s output makes sense. It increases the volume of output that needs reasoning applied to it."

I think the question behind the question Albrecht asks above, and a key theme of this post is, whether the use of AI will eventually erode our ability to provide that kind of mainline economic reasoning? Or human reasoning in general for that matter?

So What's New Besides Even More Advanced AI?

Since my last post, recent publications in this space have expanded on the consequences of use of more advanced AI in society. Specifically I will be drawing from a recent NBER working paper: AI, Human Cognition and Knowledge Collapse as well as other related work.  In this paper authors consider how generative AI, and in particular agentic AI, shapes human learning incentives and the long-run evolution of society’s information ecosystem.  In this paper they build a dynamic model of learning and decision-making and discuss the implications. They discuss how there is dynamic tension in that AI can improve decision quality today, but erode learning incentives that sustain long term collective knowledge, potentially even leading to a total knowledge collapse where "in the long-run equilibrium all human knowledge is destroyed." 

I will be drawing a lot from the paper and other work they cited along with a few other resources.

In this post I am not setting out to prove anything, or empirically defend any specific hypotheses (I'll leave that to the AI researchers and academic economists). My goal is to only to draw parallels between this recent work and build on my prior thoughts on the implications of the use of AI and knowledge in society. 

First I will give a brief overview of my understanding of the paper and their model. Then I want to discuss both micro-level and macro-level implications. At the micro level I want to discuss the impacts on the individuals and businesses. At the macro level I want to discuss implications for society as a whole. 

AI, Human Cognition, and Knowledge Collapse - Summary

The paper discusses the role of substitution effects, complements, economies of scope, and externalities in the production and use of knowledge and decision making as it relates to AI. When people put forth the effort to learn without AI, there is a private benefit in that what is learned helps make better decisions. This private knowledge is also complemented by the existing stock of public knowledge. AI can leverage public knowledge and produce context specific (local) knowledge and recommendations to individuals.  This also supports better decision making, but at a lower cost because AI substitutes for individual learning effort. It is important to note that without AI, individual learning often contributes a marginal amount of new knowledge to society's stock of general knowledge. This joint production of individually useful specific and public general knowledge represents economies of scope in the production of knowledge. We know that new knowledge plays an important role in human progress and sustainable economic growth over time as pointed out by Arrow and his work related to learning and doing and the role of knowledge (1962) and more recently Romer's growth models with endogenous technological change. At the same time, individuals don't necessarily directly benefit from their contribution of new knowledge to the public stock of knowledge (also discussed in Arrow). So private production of public  knowledge comes at an uncompensated cost resulting in a positive externality to society. The private benefit and lower cost of learning that AI delivers reduces individual effort in knowledge production given the uncompensated positive externality. I'll stop there and return to the paper's treatment of macro level impacts later. First I want to discuss micro level implications of the model.

A Microeconomic Persepective of the Use of Knowledge in Society

Individual Level Impacts

In my last post I called out a few examples of how we might use AI at the individual level and where things can go wrong. One example is attempting to use AI as a research assistant:

[What this leaves out is] how much you get out of putting your hands on a paper or book and going through it and wrestling with the ideas, the paths leading from from hypotheses to the conclusions, and how the cited references let you retrace the steps of the authors to understand why, either slowly nudging your priors in new directions or reinforcing your existing perspective, and synthesizing these ideas with your own. Then summarizing and applying and communicating this synthesis with others. ChatGPT might give the impression that is what it is doing in a fraction of the time you could do it (literally seconds vs. hours or days)....There is a big difference between the learning that takes place when you go through this process of integrative complex thinking vs. just reading a summary delivered on a silver platter from chatGPT.  I’m skeptical what I’m describing can be outsourced to AI without losing something important....How much knowledge and important nuance is lost with every [updated query to AI]? What is missed? Thinking! [and learning]

This parallels much of what is discussed in the NBER working paper. They state these sorts of issues more formally in their modeling assumptions as they relate to the substitution and crowding out effects of AI and knowledge generation. 

There is also other evidence related to negative individual effects of using AI called out within the NBER paper. 

In Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task, Kosmyna et. al discuss the impact of using AI for writing tasks:

While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI's role in learning.

In AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking, Gerlich finds a significant negative correlation between frequent AI tool usage and critical thinking abilities with a worse effect among younger learners compared to older subjects. Other researchers Budiyono et al. (2025) have made similar findings: 

"Reliance on AI writing tools significantly reduced cognitive effort and creativity, overshadowed personal writing styles, and led to a decline in confidence and skill retention. These results suggest that, while AI tools enhance efficiency and technical accuracy, over-reliance on them may hinder the development of critical thinking, creativity, and independent writing skills" 

This surfaces the importance of critical thinking in the face of increased reliance on AI tools and the need to mitigate the negative effects of AI on those thinking skills. As Kiesling notes in her blog post, "the profession will have to rethink how it cultivates judgment when many traditional apprenticeship tasks have been automated." That likely goes for all professions and something businesses need to think about when it comes to developing talent in our future workforce. 

Impact at the Business Level

In my pior post I noted: 

AI is not capable of doing these things [actual thinking tasks], and believing and even attempting or pretending that we can get these things on a second-handed basis from an AI tool will ultimately erode the real human skills and capabilities essential to real productivity and growth over the long run.  If we fail to accept this we will hear a giant sucking sound that is the ROI we thought we were going to get from AI in the short run by attempting to automate what can't be automated. That is the false promise of a tools and technology mindset.

These seem to be related to the implications of substitution effects and crowding out in the paper, but impacting the firm level.

I also discussed a point made by Cassie Kozrykov in a video where she discussed these issues: 

"that may be the biggest problem, that management has not learned how to manage thinking...vs. what you can measure easily....thinking is something you can't force, you can only get in the way of it."

She elaborates a bit more about this in her LinkedIn post"A misguided view of productivity could mean lost jobs for workers without whom organizations won't be able to thrive in the long run - what a painful mistake for everyone."

In her blog post mentioned earlier, Kiesling makes an important observation related to this line of thinking: "If AI cheapens formalized information processing, then tacit knowledge, local knowledge, judgment, and institutional understanding may rise in relative value." But recent research (Ide, 2025) indicates that one of the downsides of reliance on AI is that "tools that automate entry-level tasks"  are "likely to disrupt the diffusion of tacit knowledge" especially to novice workers. 

This could eventually lead to less productive firms as overtime the work force becomes less knowledgeable than the least knowledgeable pre-AI solvers. In other words, AI will put a premium on expertise, but if the level of available expertise erodes over time with the use of AI, it could ultimately erode productivity and firm value. (see also Ide and Talamas, 2025 for more implications). Next I will turn to the potential aggregate impact of these forces on society overall. 

Impacts on Society

The paper discusses how more accurate AI benefits the individual and reduces their required effort to learn (direct effect). However this reduction in private effort comes at a potential cost to society - the loss or crowding out of of any marginal production of new knowledge (indirect effect). 

The substitution and crowding out effects of AI can lead to long term reductions in the stock of public knowledge, and in certain situations the author's model shows this can lead to a knowledge collapse. 

Specifically this is tied to the level of AI accuracy. When AI recommendations exceed an accuracy threshold, the economy can tip into a knowledge collapse steady state in which general knowledge vanishes ultimately despite high quality personalized advice. That implies the better and more accurate the AI the worse things can get. 

I'll have more to discuss about societal impacts below. 

Conclusions

So what is my perspective on the takeaways for individuals, firms, and society? First an important distinction. In my earlier post I talked about the distinctions Cassie Kozrykov made between thinking and what she called thunking:

"[thunking includes] things that consume our time and resources but don't require thinking. Having done your homework, the kind of summary information you get from an LLM can help reinforce your thinking and learnings and save time in terms of manually googling or looking up a lot of things you once knew but have forgotten."

So when AI is used for thunking, for example scanning a form to make sure it is complete or checking for errors, or extracting key topics from a chat or call transcript, etc.,  the substitution and crowding out effects in the paper would be minimal and so would not have the detrimental impacts on learning and society's stock of knowledge. The negative effects of AI arise when AI is used for thinking tasks.

Impacts on Early Career Individuals

For those early in their career the substitution and crowding out effects mentioned above may be challenging and require making strategic tradeoffs. They need to think carefully about how they use AI. Fully embracing AI for thunking makes sense, but they should be cautious about using it for thinking tasks where they may miss out on learning, personal growth and development opportunities.  A bigger challenge may be that opportunities for learning and development could be eliminated through automation (as discussed in Ide, 2025). 

Impacts on Seasoned Professionals and Expertise

I might speculate that for those that have already gained lots of training and experience before AI, the substitution and crowding out effects would be minimal. They might use AI for thinking tasks, and going forward not notice overtime any depreciation in their skills.  In other words, they already have sufficient human capital to draw on and could complement that with AI giving them a competitive advantage (capital complements labor). However I think their personal contribution to society's stock of knowledge would be ambiguous. 

Will AI reduce the demand for expertise? In the short term we might be deluded by AI to think that it is mimicking expertise. So at first returns to expertise may drop as companies try to cut costs in the short run. However, as noted in Ide (2025) and if we think about the points made by Albrecht and Kiesling above, as AI shifts margins, certain kinds of expertise become important, the kinds of knowledge and judgment that isn't going to be in any training data for AI to access and learn from. 

"In a world where production becomes abundant, discernment becomes relatively scarce and thus relatively more valuable. What matters more is the ability to decide....what assumptions are plausible, which results travel across contexts, and what pattern in the evidence actually matters." 

This really gets to the heart of Hayek's knowledge problem, know how and know what are still going to be dispersed across many minds, and unavailable to any centralized decision maker with or without AI no matter how powerful AI becomes. The problem of managing dispersed knowledge remains. And one of the key points to this whole discussion is accepting the fact that how you know what you know matters as much as what you know when it comes to making better decisions. So expertise focused on managing and solving these problems ('solvers' as denoted in Ide, 2025) and the need for advice will still command a premium in a world with AI - especially if AI leads to an erosion of the general stock of knowledge and expertise in the future according to the model discussed above. As Ide and Talamas (2025) note - more knowledgeable workers will likely benefit disproportionately from AI. This is emphasized more in the discussion about implications for businesses below.

Impacts on Businesses

The use of AI for thunking tasks will be areas where there is obvious business value from AI. But a big challenge for business firms will be how do you take adavantage of productivity gains of AI and remain competitive while cultivating knowledge and judgment among employees if you are also automating away opportunities to learn? How do you avoid eroding the stock of knowledge at the firm level?

We know, taking a knowledge based theory of the firm, that the value of the firm is the sum of its decisions, and better decisions require knowledge. A firm's portfolio of knowledge assets becomes a source of value and competitive advantage (Grant, 2010). If the model in the NBER article is realistic, and there are substitution, externalities, and crowding out effects from AI, how do firms manage this portfolio in an age of AI without cannibalizing their most precious assets?

Comments from Kiesling are worth repeating: "If AI cheapens formalized information processing, then tacit knowledge, local knowledge, judgment, and institutional understanding may rise in relative value."

Ide (2025) emphasizes the importance of "expanding novices’ access to high-quality mentorship" from experts that have likely accumulated tacit knowledge and expertise over their careers prior to relying on AI. This will put a premium on expertise and experience, while at the same time require investing in the professional development of novices whose learning opportunities are being automated away. How do firms encourage workers to invest in learning and producing knowledge essential to growth and competitive advantage? Without the right incentive structures and professional development strategies, opportunities will likely be automated away and/or workers will take advantage of the substitution effects of AI. If this is the direction AI takes us, the next generation of workers will lack expertise, and they won't contribute to the growing stock of knowledge necessary for sustained competitive advantage at the firm level. 

Impacts on Society

If we think about growth models in economics we have to wonder if AI will enhance economic growth through technological change, or will the use of AI actually lead to knowledge collapse (as in the NBER paper) and stagnation? As Robert Lucas once said regarding economic growth and development: "The consequences for human welfare involved in questions like these are simply staggering: Once one starts to think about them, it is hard to think about anything else."

I think this is hard to really know which margins will really change, and which forces will dominate, and what will actually be the long term (or even short term) impacts of AI. The dominant narrative I hear often is that AI will help us solve problems we could never solve before and transform the fields of science, medicine, business and warfare. Many of us can already point to use cases that have benefited us personally. There is also a darker narrative about unemployment and loss of purpose. 

While the dynamics discussed in the NBER paper seem plausible, and correspond a lot with my prior thoughts on AI, I can't say for sure if I think knowledge collapse is inevitable or not. Regardless, the authors of the NBER paper propose some policy ideas to prevent knowledge collapse. They propose a two phased approach that starts with fully suppressing AI in order to rebuild the stock of general knowledge, followed by a phase of capping or 'garbling' the precision or accuracy of AI to maintain the general stock of knowledge. Similar to calls for moratoriums on data centers, these seem like blunt tools if not impractical. 

Going back to my original post - I think again Cassie Kozrykov makes an important point: 

"when you are not the one making the decision and it looks like the machine is doing it, there is someone who is actually making that decision for you...and I think that we have been complacent and we have allowed our technology to be faceless....how will we hold them accountable....for wisdom...thinking is our responsibility"

As I said in that post - thinking is a moral responsibility. Outsourcing our thinking and fooling ourselves into believing that we can get knowledge and wisdom and judgment second-handed from a summary written by an AI tool, believing that is the same thing and provides the same value as what we could produce as thinking humans, is a dangerous illusion.  

Thinking is the means by which the human race and civil society thrives and survives. That may not be a solution that can easily be turned into a business strategy or law, but it is the answer. 

Afterward: Some Connections in Literature, Philosophy, and Religion

In this section I want to discuss some loose but related connections I have made from literature, philosophy and religion. 

  • In many thoughts and discussions about AI, I can't help but think about this quote from Dune, by Frank Herbert: “Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them.” 
  • Who is John Galt? This is a reference to Atlas Shrugged by Ayn Rand. In her description, this novel was about what happens to the world when society abandons reason and the producers and 'thinkers' go on strike. According to the Atlas Society, the phrase  'who is John Galt ?' was was a kind of verbal shrug, an expression of a resigned futility in the face of a world falling apart. A despairing admission that things are unknowable and unfixable. You could imagine that as a valid response in a future world with knowledge collapse and fading wisdom.  https://www.atlassociety.org/post/who-is-john-galt-2 

This excerpt gives an idea: 

"The source of all evil is that nameless act which all of you practice...the act of blanking out, the willful suspension of one's consiousness, the refusal to think, not blindness, not ignorance, the refusal to know. It is the act of unfocusing your mind and inducing an inner fog to evade the responsibility of judgment."

  • In The Fountainhead, Rand emphasizes the importance of individuality and thinking for oneself vs. relying on others to think for you - an act she refers to as second-handedness: 
“That, precisely, is the deadliness of second-handers… Not to judge, but to repeat. Not to do, but to give the impression of doing….What would happen to the world without those who do, think, work, produce?…You don't think through another's brain and you don't work through another's hands. When you suspend your faculty of independent judgment, you suspend consciousness. To stop consciousness is to stop life.”

  • In his article "Idols of the Valley", Yuval Levin writes about Pope Leo XIV's encyclical about AI, Magnifica Humanitas. He discusses, in a sense, the moral and religious implications of the substitution effects (or shortcuts) of AI, as a form of idolatry: 

"...the danger to which Pope Leo is pointing...is the danger of turning our tools into idols, and thereby of becoming little more than tools ourselves. It is a danger that afflicts those who make these idols, and also threatens those who put their trust in them. The appeal of idols has always been that they offer shortcuts. The God of the Bible demands that you live in a way that forms your mind and heart and soul toward your fullest human potential. This requires hard work but it yields a kind of person both capable and worthy of a flourishing life. The idol offers the material benefits of such a life without that formative work...This plainly rhymes with some of the deepest moral challenges posed to us by artificial intelligence. AI, at least used a certain way, offers us shortcuts around formative work, matching outputs with inputs without the need for the interceding effort of mind, heart, and soul. If all you care about are the outputs, not the form of your mind, heart, and soul, then the offer is awfully hard to resist....various idolatries offer us shortcuts that promise the benefit without the work: Just turn yourself into a tool and you will be more productive without more effort. This is of course just what Magnifica Humanitas warns of. It is what AI at its most idolatrous and dangerous can offer. That doesn’t have to be what AI is in our experience — not at all. But it can be if we aren’t careful."

  • Can AI actually think? A lot of the discussion in all of the above is in a sense about the tensions between using AI for thinking vs. thunking.  Again, Cassie Kozrykov has a position on this: "AI does not automate thinking. It doesn't! There is a lot of strange rumblings about this that sound very odd to me who has been in this space for 2 decades." That may be a good reason why she advocates for using AI for what she calls thunking tasks but against thinking tasks. 
  • From a purely metaphysical perspective, there may be good reason to believe that no matter what advancements are made in neuroscience or computer science, machines will never truly be able to think as humans do. In his book Immortal Souls, philosopher Ed Feser makes this case. 
  • In his critique of AI, Feser states: "The contemporary obsession with computers as a model for the human mind is a wild goose chase."  If I were to crudely summarize some of his arguments I would start by considering what does it mean to think? Thoughts are required to think. What do thoughts require? Thoughts require things like abstract concepts and universals all of which are immaterial - they have no matter and take up no space. It follows that formal thought processes cannot be material. Ergo machines, which are wholly material cannot have thought processes and cannot think. 
  • Another way of thinking about this is in terms of immanent vs. transuent causation which Feser discusses in more detail in his book Aristotle's Revenge: The Metaphysical Foundations of Physical and Biological Science. Feser describes an immanent causal process as one that originates within an agent on its own. It is a teleological process that points to or aims toward the realization of ends. It is basically having an intention and acting on it - which is what we think of minds being able to do. Transuent causal processes are imposed on objects and terminate outside an agent. This would be like a boulder rolling down a hill or gears in clocks keeping time,..physical processes like computers executing code. Thinking he argues, requires immanent causal processes. 
  • But with advances in computer science and our understanding of neuroscience, could machines actually think if we make them complex enough? Could thinking be an emergent property of physical processes? Feser Argues that increasing complexity is simply a matter of increasing the complexity of transuent causal processes. He states: "you can add to a transuent causal process all the further transuent causal processes you like but you will never get immanent causation out of it. The most you will get is something that might look like immanent causation, just as a polygon with sufficiently many sides might look like a circle...thinking is an activity that cannot be coherently analyzed in terms of transuent causation alone." As philosopher J.P Morland states: "pointing to emergence is simply to slap a label on a problem rather than solve it." 
  • One might attempt to bypass Feser's arguments by denying the distinction between transuent and immanent causal processes and simply eliminate immanent causal processes from our picture of reality. But this is hard to do coherently. As Feser argues: "the eliminativist has to carry out immanent causal activity in the very act of denying that there is such a thing as immanent causal activity. His position is incoherent." As M.R. Bennett and P.M.S. Hacker have noted "the eliminativist saws off the branch on which he is seated."


Related Posts

If Applied Econometrics Were Easy, LLMs Could Do It https://econometricsense.blogspot.com/2023/07/if-applied-econometrics-were-easy-llms.html

Statistics is a Way of Thinking Not a Just a Box of Tools. https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html 

Will There Be a Credibility Revolution in Data Science and AI? https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html 

R.A. Fisher, Big Data, and Pretended Knowledge. https://econometricsense.blogspot.com/2021/07/ra-fisher-big-data-and-thinking-like.html 

Experimentation and Causal Inference Meet the Knowledge Problem. https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html

References

Kenneth J Arrow. The economic implications of learning by doing. The review of economic studies, 29(3):155–173, 1962a.

Philosophical Foundations of Neuroscience. 1st Ed. M. R. Bennett, P. M. S. Hacker. Blackwell. 2003

Herman Budiyono, M Pudjaningsih, B Prastio, and A Maulidina. Exploring the long-term impact of ai writing tools on independent writing skills: a case study of indonesian language education students. International Journal of Information and Education Technology, 15(5):1003–1013, 2025.

Aristotle's Revenge: The Metaphysical Foundations of Physical and Biological Science. Edward Feser. 2019.

Immortal Souls: A Treatise on Human Nature. Edward Feser. 2024.

Michael Gerlich. Ai tools in society: Impacts on cognitive offloading and the future of critical
thinking. Societies, 15(1):6, 2025.

Grant, Robert M. Contemporary Strategy Analysis. 7th Edition. John Wiley and Sons. U.K. (2010).

The Use of Knowledge in Society. F. A. Hayek. The American Economic Review, Vol. 35, No. 4. (Sep., 1945), pp. 519-530

Enrique Ide. Automation, ai, and the intergenerational transmission of knowledge. arXiv preprint arXiv:2507.16078, 2025. Journal of Political Economy, 133(12):3762–3800, 2025.

Enrique Ide and Eduard Talam`as. Artificial intelligence in the knowledge economy. Journal of Political Economy, 133(12):3762–3800, 2025.

Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, and Pattie Maes. Your brain on chatgpt: Accumulation of cognitive debt when using an ai assistant for essay writing task. arXiv preprint arXiv:2506.08872, 2025. 

Thunking vs Thinking: Whose Job Does AI Automate? Which tasks are on AI’s chopping block? Cassie Kozrykov. https://kozyrkov.medium.com/thunking-vs-thinking-whose-job-does-ai-automate-959e3585877b


Sunday, April 6, 2025

Econometrics and Psychometrics: Rivers Out of Biometry.

Abstract

The period 1895–1925 saw the origins and establishment of the fields that came to be called econometrics and psychometrics. I consider what these fields owed to biometry—the statistical approach to the biological problems of evolution—and make some comparisons between all three. I emphasize developments in biology and psychology, for these are less familiar to historians of econometrics. These developments are interesting to contemplate, for the biometricians and psychometricians were already discussing issues associated with the respective roles of statistical analysis and of subject matter theory, issues that became prominent in econometrics only much later.

John Aldrich; Econometrics and Psychometrics: Rivers Out of Biometry. History of Political Economy 1 December 2011; 43 (suppl_1): 35–56. doi: https://doi.org/10.1215/00182702-1158790

Agricultural Economics as a Poster Child of Applied Economics

 Abstract

Agricultural economists have embodied the notions of applied economics for a long time. They have used economic principles to address real-world problems, integrating economics and scientific knowledge. Applied economics tends to be multidisciplinary and develop applied concepts, theories, and tools. Some, like human capital, diffusion of innovation, contingent valuation, and numerous numerical and econometric techniques have spread throughout economics. Agricultural economic research has been data intensive, and improved information technologies strengthen this tendency. Yet data without theory is of limited use and coevolution of theory and data are essential. Empirical analysis should incorporate quantitative information as well as narratives. We are challenged to understand the coevolution of business, supply chains, and technology, and how they are affected by policies and affect markets. Research should integrate agriculture, energy, and the environment and develop tools to analyze and regulate the emerging bio-economy integrating biotech and infotech.

Zilberman, D. (2019), Agricultural Economics as a Poster Child of Applied Economics: Big Data & Big Issues1. American Journal of Agricultural Economics, 101: 353-364. https://doi.org/10.1093/ajae/aay101

Saturday, July 29, 2023

If Applied Econometrics Were Easy, LLMs Could Do It!

Summary

Can AI do applied econometrics and causal inference? Can LLMs pick up on the nuances and social norms that dictate so many of the decisions made in applied work and reflect them in response to a prompt? LLMs bring to the table incredible capabilities and efficiencies and opportunities to create value. But there are risks when these tools are used like Dunning-Kruger-as-a-Service (DKaaS), where the critical thinking and actual learning begins and ends with prompt engineering and a response. We have to be very careful to recognize as Philip Tetlock describes in his book "Superforecasters" that there is a difference between mimicking and reflecting meaning vs. originating meaning.  To recognize that it’s not just what you know that matters, but how you know what you know. The second-handed tendency to believe that we can or should be outsourcing, nay, sacrificing our thinking to AI in exchange for misleading if not false promises about value, is philosophically and epistemically disturbing.

AI vs. Causal Thinking

This is a good article, from causal lens: Enterprise Decision Making Needs More Than Chatbots

"while LLMs are good at learning and extracting information from a corpus, they’re blind to something that humans do really well – which is to measure the impact of one’s decisions." 

In a recent talk Cassie Kozrykov puts it well: "AI does not automate thinking!"

   

Channelling Judea Pearl, understanding what makes a difference (causality)requires more than data, it also requires something not in the data to begin with. So much of the hype around AI is based on a tools and technology mindset. As Captain Jack Sparrow says about ships in Pirates of the Caribbean, a ship is more than sails and rudders, those are things a ship needs. What a ship really is, is freedom. Causal inference is more than methods and theorems, those are things causal inference needs, but what it really is, is a way of thinking. And in business, what is required is an alignment of thinking. For instance, in his article The Importance of Being Causal, Ivor Bojinov describes the Causal Data Analysis Review Committee at LinkedIn. It is a common best practice in learning organizations that leverage experimentation and causal inference. 

If you  attended very many of those reviews you begin to appreciate the amount of careful thinking required to understand the business problem, frame the hypothesis, and translate it to an analytical solution....then interpret the results and make a recommendation about what action to take next. Similarly a typical machine learning workflow requires up front thinking and problem framing. But unlike training an ML model, as Scott Lundberg describes (see my LI Post: Beyond SHAP Values and Crystal Balls), understanding what makes a difference is not just a matter of letting an algo figure out the best predictors  and calling it a day, there is an entire garden of forking paths to navigate and each turn requires more thinking and a vast difference in opinions among 'experts' about which direction to go.

As I discussed in a past post about forking paths in analysis

"even if all I am after is a single estimate of a given regression coefficient, multiple testing and researcher degrees of freedom may actually become quite a relevant concern...and this reveals the fragility in a lot of empirical work that prudence would require us to view with a critical eye"

Sure you could probably pair a LLM with statistical software and a data base connection and ask it to run a regression, but getting back to Jack Sparrow's ship analogy, a regression is more than just fitting a line to data and testing for heteroskedasticity and multicollinearity (lets hope if LLMs train on econometrics textbooks they don't weight the value of information by the amount of material dedicated to multicollinearity!!!) and the laundry list of textbook assumptions. AI could probably even describe in words a mechanical interpretation of the results. All of that is really cool, and something like that could save a lot of time and augment our workflows (which is valuable) but we also have to be careful about that tools mindset creeping back on us. All those things that AI may be able to do are only the things regression needs, but to get where we need to go, to understand why, we need way more than what AI can currently provide. We need thinking. So even for a basic regression, depending on our goals, the thinking required is currently and may always be beyond the capabilities of AI.

When we think about these forking paths encountered in applied work, each path can end with a different measure of impact that comes with a number of caveats and tradeoffs to think about. There are seldom standard problems with standard solutions. The course of action taken requires conscious decisions and the meeting of minds among different expert judgements (if not explicitly then implicitly) that considers all the tradeoffs involved in moving from what may be theoretically correct and what is practically feasible. 

In his book, "A Guide to Econometrics" Peter Kennedy states that "Applied econometricians are continually faced with awkward compromises" and offers a great story about what it's like to do applied work: 

"Econometric theory is like an exquisitely balanced French recipe, spelling out precisely with how many turns to mix the sauce, how many carats of spice to add, and for how many milliseconds to bake the mixture at exactly 474 degrees of temperature. But when the statistical cook turns to raw materials, he finds that hearts of cactus fruit are unavailable, so he substitutes chunks of cantaloupe; where the recipe calls for vermicelli he used shredded wheat; and he substitutes green garment die for curry, ping-pong balls for turtles eggs, and for Chalifougnac vintage 1883, a can of turpentine."

What choice would AI driven causal inference make when it has to make the awkward compromise between Chalifougnac vintage 1883 and turpentine and how would it explain the choice it made and the thinking that went into it? How would that choice stack up against the opinions of four other applied econometricians who would have chosen differently? 

As Richard McElreath discusses in his great book Statistical Rethinking:

"Statisticians do not in general exactly agree on how to analyze anything but the simplest of problems. The fact that statistical inference uses mathematics does not imply that there is only one reasonable or useful way to conduct an analysis. Engineers use math as well, but there are many ways to build a bridge." 

This is why in applied economics so much of what we may consider as 'best practices' are as much the result of social norms and practices as they are textbook theory. These norms are often established and evolve informally over time and sometimes adapted to the particulars of circumstances and place unique to a business or decision making environment, or research discipline (this explains the language barriers for instance between economists and epidemiologists and why different language can be used to describe the same thing and the same language can mean different things to different practitioners). A kind of result of human action but not human design, many best practices may seldom be formally codified or published in a way accessible to train a chatbot to read and understand. Would an algorithm be able to understand and relay back this nuance? I gave this a try by asking chatGPT about linear probability models (LPMs), and while I was impressed with some of the detail, I'm not fully convinced at this point based on the answers I got. While it did a great job articulating the pros and cons of LPMs vs logistic regression or other models, I think it would leave the casual reader with the impression that they should be wary of relying on LPMs to estimate treatment effects in most situations. So they miss out on the practical benefits (the 'pros' that come from using LPMs) while avoiding the 'cons' that as Angrist and Pischke might say, are mostly harmless. I would be concerned about more challenging econometric problems with more nuance and more appeal to social norms and practices and thinking that an LLM may not be privy to.

ChatGPT as a Research Assistant

Outside of actually doing applied econometrics and causal inference, I have additional concerns with LLMs and AI when it comes to using them as a tool for research and learning. At first it might seem really great if instead of reading five journal articles you could just have a tool like chatGPT do the hard work for you and summarize them in a fraction of the time! And I agree this kind of summary knowledge is useful, but probably not in the way many users might think. 

I have been thinking a lot about how much you get out of putting your hands on a paper or book and going through it and wrestling with the ideas, the paths leading from from hypotheses to the conclusions, and how the cited references let you retrace the steps of the authors to understand why, either slowly nudging your priors in new directions or reinforcing your existing perspective, and synthesizing these ideas with your own. Then summarizing and applying and communicating this synthesis with others. 

ChatGPT might give the impression that is what it is doing in a fraction of the time you could do it (literally seconds vs. hours or days). However, even if it gave the same summary you could write verbatim the difference couldn't be as far apart as night and day in terms of the value created. There is a big difference between the learning that takes place when you go through this process of integrative complex thinking vs. just reading a summary delivered on a silver platter from chatGPT. I’m skeptical what I’m describing can be outsourced to AI without losing something important. I also think there are real risks and costs involved when these tools are used like Dunning-Kruger-as-a-Service (DKaaS), where the critical thinking and actual learning begins and ends with prompt engineering and a response. 

When it comes to the practical application of this knowledge and thinking and solving new problems it’s not just what you know that matters, but how you know what you know. If all you have is a summary, will you know how to navigate the tradeoffs between what is theoretically correct and what is practically feasible to make the best decision in terms of what forking path to take in an analysis? Knowing about the importance of social norms and practices in doing applied work, and if the discussion above about LPMs is any indication, I'm not sure. And with just the summary, will you be able to quickly assimilate new developments in the field....or will you have to go back to chatGPT. How much knowledge and important nuance is lost with every update? What is missed? Thinking!

As Cassie says in her talk, thinking is about:

"knowing what is worth saying...knowing what is worth doing, we are thinking when we are coming up with ideas, when we are solving problems, when we are being creative"

AI is not capable of doing these things, and believing and even attempting or pretending that we can get these things on a second-handed basis from an AI tool will ultimately erode the real human skills and capabilities essential to real productivity and growth over the long run. If we fail to accept this we will hear a giant sucking sound that is the ROI we thought we were going to get from AI in the short run by attempting to automate what can't be automated. That is the false promise of a tools and technology mindset.

It worries me that this same tools and technology based data science alchemy mindset has moved many managers who were once were sold the snake oil that data scientists could simply spin data into gold with deep learning, will now buy into the snake oil that LLMs will be able to spin data into gold and do it even cheaper and send the thinkers packing! 

Similarly Cassie says: "that may be the biggest problem, that management has not learned how to manage thinking...vs. what you can measure easily....thinking is something you can't force, you can only get in the way of it."

She elaborates a bit more about this in her LinkedIn post: "A misguided view of productivity could mean lost jobs for workers without whom organizations won't be able to thrive in the long run - what a painful mistake for everyone."

Thunking vs. Thinking

I did say that this kind of summary info can be useful. And I agree that the kinds of things that AI and LLMs will be useful for are what Cassie refers to in her talk as 'thunking.'  The things that consume our time and resources but don't require thinking. Having done your homework, the kind of summary information you get from an LLM can help reinforce your thinking and learnings and save time in terms of manually googling or looking up a lot of things you once knew but have forgotten. If there is an area you haven't thought about in a while it can be a great way to help get back up to speed. And when trying to learn new things, it can be leveraged to speed up some aspects of your discovery process or make it more efficient, or even help challenge or vet your thinking (virtually bouncing ideas back and forth). But to be useful, this still requires some background knowledge and should never be a substitute for putting your hands on a paper and doing the required careful and critical thinking.

One area of applied econometrics I have not mentioned is the often less glamorous work it takes to implement a solution. In addition to all the thinking involved in translating the solution and navigating the forking paths, there is a lot of time spent accessing and transforming the data and implementing the estimation that involves coding (note even in the midst of all that thunking work there is still thinking involved - sometimes we learn the most about our business and our problem while attempting to wrangle the data - so this is also a place where we need to be careful about what we automate). Lots of data science folks are also using these tools to speed up some of their programming tasks. I'm a habitual user of stack-exchange and git hub and constantly recycle my own code or others' code. But I burn a lot of time somedays in search of what I need. That's the kind of thunking that it makes since to enlist new AI tools for!

Conclusion: Thinking is Our Responsibility

I've observed two extremes when it comes to opinions about tools like ChatGPT. One is that LLMs have the knowledge and wisdom of Yoda and will solve all of our problems. The other extreme is that because LLMs don't have the knowledge and wisdom of Yoda they are largely irrelevant. Obviously there is middle ground and I am trying to find it in this post. And I think Cassie has found it:

"AI does not automate thinking. It doesn't! There is a lot of strange rumblings about this that sound very odd to me who has been in this space for 2 decades"

I have sensed those same rumblings and it should make us all feel a bit uneasy. She goes on to say:

"when you are not the one making the decision and it looks like the machine is doing it, there is someone who is actually making that decision for you...and I think that we have been complacent and we have allowed our technology to be faceless....how will we hold them accountable....for wisdom...thinking is our responsibility"

Thinking is a moral responsibility. Outsourcing our thinking and fooling ourselves into thinking we can get knowledge and wisdom and judgment second-handed from a summary written by an AI tool, and to believe that is the same thing and provides the same value as what we could produce as thinking humans is a dangerous illusion when ultimately, thinking is the means by which the human race and civil society ultimately thrives and survives. In 2020 former President Barak Obama emphasized the importance of thinking in a democracy: 

"if we do not have the capacity to distinguish what's true from what's false, then by definition the marketplace of ideas doesn't work. And by definition our democracy doesn't work. We are entering into an epistemological crisis." 

The wrong kind of tools and technology mindset, and obsequiousness toward the technology, and a second-handed tendency to believe that we can or should be outsourcing, nay, sacrificing our thinking to AI in exchange for misleading if not false promises about value, is philosophically and epistemically disturbing.

LLMs bring to the table incredible capabilities and efficiencies and opportunities to create value. But we have to be very careful to recognize as Philip Tetlock describes in his book Superforecasters, that there is a difference between mimicking and reflecting meaning vs. originating meaning.  To recognize that it’s not just what you know that matters, but how you know what you know. To repurpose the closing statements from the book Mostly Harmless Econometrics: If applied econometrics were easy, LLMs could do it.

Additional Resources:

Thunking vs Thinking: Whose Job Does AI Automate? Which tasks are on AI’s chopping block? Cassie Kozrykov. https://kozyrkov.medium.com/thunking-vs-thinking-whose-job-does-ai-automate-959e3585877b

Statistics is a Way of Thinking Not a Just a Box of Tools. https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html 

Will There Be a Credibility Revolution in Data Science and AI? https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html 

Note on updates: An original version of this post was written on July 29 in conjunction with the post On LLMs and LPMs: Does the LL in LLM Stand for Linear Literalism? Shortly after posting I ran across Cassie's talk and updated to incorporate many of the points she made, with the best of intentions. Any  misrepresentation/misappropriation of her views is unintentional. 

On LLMs and LPMs: Does the LL in LLM Stand for Linear Literalism?

 I've blogged in the past about what I call linear literalism and fundamentalist econometrics. And I've blogged a bit about linear probability models (LPMs). Recently I have had some concerns about people outsourcing their thinking to LLMs and the use of these tools like Dunning-Kruger-as-a-Service (DKaaS) where the critical thinking and actual learning starts and stops with prompt engineering and a response. Out of curiosity I asked ChatGPT about the appropriateness of using linear probability models. Although the overall response was thoughtful about thinking more carefully about causality, it still gave the canned 'thou shalt not'  theoretically correct fundamentalist response. My prompt could have been more sophisticated, but I tried to prompt from a user's prospective, someone who may not be as familiar with applied statistics work, or who may have even read my blog and wanted to question something about the use of LPMs and may not be thinking about the tradeoffs or who may be unfamiliar with the social norms and practices related to their use.  As has been noted before on this blog, in applied work, there is no consensus among practitioners that nonlinear models (like logistic regression) are 'better' than LMPs when estimating treatment effects. If anything this illustrates at best, a response from an LLM about applied econometric analysis could be just as good as having another expert in the room, but an experienced practitioner understands that experts often disagree, and that disagreement comes with a lot of nuance, and is often as much the result of social norms and practices as theory. Perhaps someone could take the fundamentalist response from this prompt and do their analysis and solve their problem and there is no harm at the end of the day. But there is danger in fundamentalism, if this leads them to ignore great work and potential learnings derived from LPMs, or prevents them from getting more actionable and interpretable results vs. stumbling through the mental gymnastics sometimes required to interpret things like odds ratios vs. marginal effects! 

Note: As I refined my prompt I got slightly better answers, but not one that really captures the nuance I would like to have. Again....it's like arguing with an expert that knows the theory but may not be as familiar with all the nuances of different disciplines and the practical implications of the tradeoffs involved. I have to say - to give a response close to what I would expect from another practitioner that maybe outside the field of economics, is impressive! 

While it did a great job articulating the pros and cons of LPMs vs logistic regression or other models, I think these responses would leave the casual reader with the impression that they should be wary of relying on LPMs to estimate treatment effects in most situations. And as noted above that could be an issue depending on the situation.  Because I disagree with the tone of the response doesn't mean chatGPT is a bad tool, there are great textbooks and econometricians that I think would certainly endorse these responses! Its the missing nuance around these differences in opinion that concern me and while its not super critical in this situation, I'd be concerned about others. 

And I have to disagree that some of the 'cons' listed about LPMs are really as big of an issue that a reader might think if they take this response very seriously. Most of the listed 'cons' can often be addressed easily and in my experience despite all the alleged 'cons' LPMs generally lead to the same decision you would make when properly interpreting and understanding results from a logistic regression, while providing all the very practical benefits you get from an LPM.  The nuance is, the 'cons' in theory don't always have implications that matter in any practical sense. As Angrist and Pischke state: "there is a lot to be said for sticking to a linear regression function as compared to a fairly arbitrary choice of a non-linear one!"

Prompt: Is it OK to use a linear probability model to measure causal effects?

Answer:

Using a linear probability model to measure causal effects can be problematic and is generally not recommended. A linear probability model (LPM) is a linear regression model where the dependent variable is a binary outcome, typically coded as 0 or 1. In this model, the predicted values can be outside the [0, 1] range, which can lead to biased estimates and incorrect interpretations.

There are several issues with using a linear probability model for causal effects:

1. Heteroscedasticity: The variance of the error term in the linear probability model is not constant across all values of the independent variables, violating one of the assumptions of linear regression. This can lead to inefficient estimates and incorrect standard errors.

2. Predicted probabilities outside [0, 1]: Since linear regression does not constrain the predicted values to be between 0 and 1, it can produce probabilities that fall outside this range, which is not meaningful for binary outcomes.

3. Non-linearity: The relationship between the independent variables and the probability of the binary outcome is often non-linear. Using a linear model may not capture this relationship accurately.

To address these issues, researchers typically use alternative methods for estimating causal effects with binary outcomes. The most common approaches are logistic regression and probit regression. These models are specifically designed for binary outcomes and provide estimates that are valid probabilities between 0 and 1.

Additionally, for estimating causal effects, researchers often use methods such as randomized controlled trials (RCTs) or quasi-experimental designs (e.g., difference-in-differences, matching methods, instrumental variables) to establish causality more convincingly.

In summary, while linear probability models might be used for exploratory analysis, they are generally not suitable for measuring causal effects with binary outcomes. Logistic regression or probit regression should be preferred when dealing with binary dependent variables, and careful consideration of study design and potential confounders is essential for establishing causality.

Additional Reading:

Love It or Logit, or: Man, People *Really* Care About Binary Dependent Variables. Marc Bellemare. http://marcfbellemare.com/wordpress/9024

Whether to probit or to probe it: in defense of the Linear Probability Model. World Bank Blogs. Jed Friedman. https://blogs.worldbank.org/impactevaluations/whether-to-probit-or-to-probe-it-in-defense-of-the-linear-probability-model 

Thursday, March 2, 2023

Are Matching Estimators and the Conditional Independence Assumption Inconsistent with Rational Decision Making

 Scott Cunningham brings up some interesting points about matching and utility maximization in this substack post: https://causalinf.substack.com/p/why-do-economists-so-dislike-conditional 

"Because most of the time, when you are fully committed to the notion that people are rational, or at least intentionally pursuing goals and living in the reality of scarcity itself, you actually think they are paying attention to those potential outcomes. Why? Because those potential outcomes represent the gains from the choice you’re making....if you think people make choices because they hope the choice will improve their life, then you believe their choices are directly dependent on Y0 and Y1. This is called “selection on treatment gains”, and it’s a tragic problem that if true almost certainly means covariate adjustment won’t work....Put differently, conditional independence essentially says that for a group of people with the same covariate values, their decision making had become erratic and random. In other words, the covariates contained the rationality and you had found the covariates that sucked that rationality out of their minds."

This makes me want to ask - is there a way I can specify utility functions or think about utility maximization that is consistent with the CIA in a matching scenario? This gets me into very dangerous territory because my background is applied economics, not theory. I think most of the time when matching is being used in observational settings, people aren't thinking about utility functions and consumer preferences and how they relate to potential outcomes. Especially non-economists. 

Thinking About Random Utility Models

The discussion above for some reason motivated me to think about random utility models (RUMs). Not being a theory person and not having worked with RUMs hardly at all, I'm being even more dangerous but hear me out, this is just a thought experiment. 

I first heard of RUMs years ago when working in market research and building models focused on student enrollment decisions. From what I understand they are an important work horse in discrete choice modeling applications. Food economist Jayson Lusk has even looked at RUMs and their predictive validity via functional magnetic resonance imaging (see Neural Antecedents of a Random Utility Model).

The equation below represents the basic components of a random utility model:

U = V + e

where = systemic utility and 'e' represents random utility. 

Consumers choose the option that provides the greatest utility. The systemic component 'V' captures attributes describing the alternative choices or perceptions about the choices, and characteristics of the decision maker.  In the cases where matching methods are used in observational settings, the relevant choice is often whether or not to participate in a program or take treatment.

This seems to speak to one of the challenges raised in Scott's post (keep in mind Scott never mentions RUMS, all this about RUMS are my meandering so if a discussion about RUMs is non-sensical its on me not him): 

"The known part requires a model, be it formal or informal in nature, and the quantified means it’s measured and in your dataset. So if you have the known and quantified confounder, then a whole host of solutions avail themselves to you like regression, matching, propensity scores, etc....There’s a group of economists who object to this statement, and usually it’s that “known” part."

What seems appealing to me is that RUMs appear to allow us to make use of what we think we can know about utility via 'V' and still admit that there is a lot we don't know, captured by 'e' in a random utility model. In this formulation 'e' still represents rationality, it's just unobservable heterogeneity in rational preferences that we can't observe. This is assumed to be random. Many economists working in discrete choice modeling contexts are apparently comfortable with the 'known' part of a RUM at least from the way I understand this.

A Thought Experiment: A Random Utility Model for Treatment Participation

Again - proceeding cautiously here, suppose that in an observational setting the decision to engage in a program or treatment designed to improve outcome Y is driven by systematic and random components in a RUM:

U = V(x) + e

and the decision to participate is based on as Scott describes the potential outcomes Y1 and Y0 which represent the gains from choosing. 

delta = (Y1 - Y0) where you get Y1 for choosing D=1 and Y0 for D=0

In the RUM you choose D = 1 if U(D = 1) > U(D = 0) 

D = f(delta) = f(Y1,Y0)= f(x)

and we specify the RUM as U(D) = V(x) + e

where x represents all the observable things that might contribute to an individual's utility (perceptions about the choices, and characteristics of the decision maker) in relation to making this decision. 

So the way I wanted to think about this is when we are matching, the factors we match/control for would be the observable variables 'x' that contribute to systemic utility V(x), while many of the unobservable aspects reflect heterogeneous preferences across individuals that we can't observe. This would contribute to the random component of the RUM. 

So in essence YES, if we think about this in the context of a RUM, the covariates contain all of the rationality (at least the observable parts) and what is unobserved can be modeled as random. We've harmonized utility maximization, matching and the CIA! 

Meeting the Assumptions of Random Utility and the CIA

But wait...not so fast. In the observational studies where matching is deployed, I am not sure we can assume the unobserved heterogeneous preferences represented by 'e' will be random across the groups we are comparing.  Those who choose D =1 will have obvious differences in preferences than those who choose D = 0. There will be important differences between treatment and control groups' preferences not accounted for by covariates in the systemic component V(x) and those unobserved preferences in 'e' will be dependent on potential outcomes Y0 and Y1 just like Scott was saying. I don't think we can assume in an observational setting with treatment selection that the random component of the RUM is really random with regard to the choice of taking treatment if the choice is driven by expected potential outcomes. 

Some Final Questions

If 'x' captures everything relevant to an individual's assessment of their potential outcomes Y1 and Y0 (and we have all the data for 'x' which itself is a questionable assumption) then could we claim that everything else captured by the term 'e' is due to random noise - maybe pattern noise or occasion noise

In an observational setting where we are modeling treatment choice D, can we break 'e' down further into components like below?

e = e1 + e2

where e1 is unobservable heterogeneity in rational preferences driven by potential outcomes Y1 & Y0 making it non random and e2 represents noise that is more random like pattern or occasion noise and likely to be independent of Y1 & Y0. 

IF the answer to the questions above is YES and we can decompose the random component of RUMS this way and e2 makes up the largest component of e (i.e  e1 is small, non-existent, or insignificant),  then maybe a RUM is a valid way to think about modeling the decision to choose treatment D and we can match on the attributes of systemic utility 'x' and appeal to the CIA (if my understanding is correct).

But the less we actually know about x and what is driving the decision as it relates to potential outcomes Y0 and Y1, the larger e1 becomes and then the random component of a RUM may no longer be random. 

If my understanding above is correct, then the things we likely would have to assume for a RUM to be valid turn out to be similar to if not exactly the things we need for the CIA to hold. 

The possibility of meeting the assumptions of a RUM or the CIA would seem unlikely in observational settings if (1) we don't know a lot about systemic utility and 'x' and (2) the random component  e turns out not to be random. 

Conclusion

So much for an applied guy trying to do theory to support the possibility of the CIA holding in matched analysis.  I should say I am not an evangelist for matching but trying to be more of a realist about its uses and validity.  Scott's post introduces a very interesting way to think about matching and the CIA and the challenges we might have meeting the conditions for it. 


Thursday, January 26, 2023

What is new and different about difference-in-differences?

Back in 2012 I wrote about the basic 2 x 2 difference in difference analysis (two groups, two time periods). Columbia public health probably has a better introduction. 

The most famous example of an analysis that motivates a 2 x 2 DID analysis is John Snow's 1855 analysis of the cholera epidemic in London:




(Image Source)

 I have since written about some of the challenges of estimating DID with glm models (see here, here, and here.), as well as combining DID with matching, and problems to watch out for when combining methods. But a lot of what we know about difference in differences has changed in the last decade. I'll try to give a brief summary based on my understanding and point towards some references that do a better job presenting the current state.

The Two-Way Fixed Effects model (TWFE)

The first thing I should discuss is extending the 2x2 model to include multiple treated groups and/or multiple time periods. The generalized model for DiD also referred to as the two-way fixed effects (TWFE) model is the best way to represent those kind of scenarios:.

Ygt = a+ b+ δDgt + εt

a= group fixed effects

b= time fixed effects

Dgt= treatment*post period (interaction term)

δ = ATT or DID estimate

Getting the correct standard errors for DID models that involve many repeated measures over time and/or where treatment and control groups are defined by multiple geographies presents two challenges compared to the basic 2x2 model. Serial correlation and correlation within groups. There are several approaches that can be considered depending on your situation.

1 - Block bootstrapping

2 - Aggregating data into single pre and post periods

3 - Clustering standard errors at the group level

Clustering at the group level should provide the appropriate standard errors in these situations when the number of clusters are large.

For more details on TWFE models, both Scott Cunningham and Nick Huntington-Klein have great econometrics textbooks with chapters devoted to these topics. See the references below for more info.

Differential Timing and Staggered Rollouts

But things can get even more complicated with DID designs. Think about situations where there are different groups getting treated at different times over a number of time periods. This is not just a thought experiment trying to imagine the most difficult study design and pondering for the sake of pondering – these kind of staggered rollouts are very common in business and policy settings.  Imagine policy rules adopted by different states over time (like changes in minimum wages) or imagine testing a new product or service by rolling it out to different markets over time. Understanding how to evaluate their impact is important. For a while it seemed economists may have been a little guilty of handwaving with the TWFE model assuming the estimated treatment coefficient was giving them the effect they wanted. 

But Andrew Goodman-Bacon refused to take this interpretation at face value and broke this down for us determining that the TWFE estimator was trying to give us a weighted average of all potential 2x2 DID estimates you could make with the data. That actually sounds intuitive and helpful. But what he discovered that is not so intuitive is that some of those 2x2 comparisons could be comparing previously treated groups with current treated groups. That's not a comparison we generally are interested in making, but it gets averaged in with the others and can drastically bias the results particularly when there is treatment effect heterogeneity (the treatment effect is different across groups and trending over time). 

So how do you get a better DID estimate in this situation? I'll spare you the details (because I'm still wrestling with them) but the answer seems to be the estimation strategy developed by Callaway and Sant'Anna. The documentation in R for their package walks through a lot of the details and challenges with TWFE models with differential timing. 

Additionally this video of Andrew Goodman-Bacon was really helpful for understanding the 'Bacon' decomposition of TWFE models and the problems above.


After watching Goodman-Bacon, I recommend this talk from Sant'Anna discussing their estimator. 

Below Nick Huntington-Klein provides a great summary of the issues made apparent by the Bacon decomposition made above and the Callaway and Sant'Anna method for staggered/rollout DID designs. he also gets into the Wooldridge Mundlack approach:

A Note About Event Studies

In a number of references I have tried to read to understand this issue, the term 'event study' is thrown around and it seems like every time it is used it is used differently but the author/speaker assumes we are all taking about the same thing. In this video Nick Huntington-Klein introduces event studies in a way that is the most clear and consistent. Watching this video might help.

References: 

Causal Inference: The Mixtape. Scott Cunningham. https://mixtape.scunning.com/ 

The Effect: Nick Huntington-Klein. https://theeffectbook.net/

Andrew Goodman-Bacon. Difference-in-differences with variation in treatment timing. Journal of Econometrics.Volume 225, Issue 2, 2021.

Brantly Callaway, Pedro H.C. Sant’Anna. Difference-in-Differences with multiple time periods. Journal of Econometrics. Volume 225, Issue 2, 2021,

Related Posts:

Modeling Claims Costs with Difference in Differences. https://econometricsense.blogspot.com/2019/01/modeling-claims-with-linear-vs-non.html 

Was It Meant to Be? OR Sometimes Playing Match Maker Can Be a Bad Idea: Matching with Difference-in-Differences. https://econometricsense.blogspot.com/2019/02/was-it-meant-to-be-or-sometimes-playing.html