Thursday, March 2, 2023

Are Matching Estimators and the Conditional Independence Assumption Inconsistent with Rational Decision Making

 Scott Cunningham brings up some interesting points about matching and utility maximization in this substack post: https://causalinf.substack.com/p/why-do-economists-so-dislike-conditional 

"Because most of the time, when you are fully committed to the notion that people are rational, or at least intentionally pursuing goals and living in the reality of scarcity itself, you actually think they are paying attention to those potential outcomes. Why? Because those potential outcomes represent the gains from the choice you’re making....if you think people make choices because they hope the choice will improve their life, then you believe their choices are directly dependent on Y0 and Y1. This is called “selection on treatment gains”, and it’s a tragic problem that if true almost certainly means covariate adjustment won’t work....Put differently, conditional independence essentially says that for a group of people with the same covariate values, their decision making had become erratic and random. In other words, the covariates contained the rationality and you had found the covariates that sucked that rationality out of their minds."

This makes me want to ask - is there a way I can specify utility functions or think about utility maximization that is consistent with the CIA in a matching scenario? This gets me into very dangerous territory because my background is applied economics, not theory. I think most of the time when matching is being used in observational settings, people aren't thinking about utility functions and consumer preferences and how they relate to potential outcomes. Especially non-economists. 

Thinking About Random Utility Models

The discussion above for some reason motivated me to think about random utility models (RUMs). Not being a theory person and not having worked with RUMs hardly at all, I'm being even more dangerous but hear me out, this is just a thought experiment. 

I first heard of RUMs years ago when working in market research and building models focused on student enrollment decisions. From what I understand they are an important work horse in discrete choice modeling applications. Food economist Jayson Lusk has even looked at RUMs and their predictive validity via functional magnetic resonance imaging (see Neural Antecedents of a Random Utility Model).

The equation below represents the basic components of a random utility model:

U = V + e

where = systemic utility and 'e' represents random utility. 

Consumers choose the option that provides the greatest utility. The systemic component 'V' captures attributes describing the alternative choices or perceptions about the choices, and characteristics of the decision maker.  In the cases where matching methods are used in observational settings, the relevant choice is often whether or not to participate in a program or take treatment.

This seems to speak to one of the challenges raised in Scott's post (keep in mind Scott never mentions RUMS, all this about RUMS are my meandering so if a discussion about RUMs is non-sensical its on me not him): 

"The known part requires a model, be it formal or informal in nature, and the quantified means it’s measured and in your dataset. So if you have the known and quantified confounder, then a whole host of solutions avail themselves to you like regression, matching, propensity scores, etc....There’s a group of economists who object to this statement, and usually it’s that “known” part."

What seems appealing to me is that RUMs appear to allow us to make use of what we think we can know about utility via 'V' and still admit that there is a lot we don't know, captured by 'e' in a random utility model. In this formulation 'e' still represents rationality, it's just unobservable heterogeneity in rational preferences that we can't observe. This is assumed to be random. Many economists working in discrete choice modeling contexts are apparently comfortable with the 'known' part of a RUM at least from the way I understand this.

A Thought Experiment: A Random Utility Model for Treatment Participation

Again - proceeding cautiously here, suppose that in an observational setting the decision to engage in a program or treatment designed to improve outcome Y is driven by systematic and random components in a RUM:

U = V(x) + e

and the decision to participate is based on as Scott describes the potential outcomes Y1 and Y0 which represent the gains from choosing. 

delta = (Y1 - Y0) where you get Y1 for choosing D=1 and Y0 for D=0

In the RUM you choose D = 1 if U(D = 1) > U(D = 0) 

D = f(delta) = f(Y1,Y0)= f(x)

and we specify the RUM as U(D) = V(x) + e

where x represents all the observable things that might contribute to an individual's utility (perceptions about the choices, and characteristics of the decision maker) in relation to making this decision. 

So the way I wanted to think about this is when we are matching, the factors we match/control for would be the observable variables 'x' that contribute to systemic utility V(x), while many of the unobservable aspects reflect heterogeneous preferences across individuals that we can't observe. This would contribute to the random component of the RUM. 

So in essence YES, if we think about this in the context of a RUM, the covariates contain all of the rationality (at least the observable parts) and what is unobserved can be modeled as random. We've harmonized utility maximization, matching and the CIA! 

Meeting the Assumptions of Random Utility and the CIA

But wait...not so fast. In the observational studies where matching is deployed, I am not sure we can assume the unobserved heterogeneous preferences represented by 'e' will be random across the groups we are comparing.  Those who choose D =1 will have obvious differences in preferences than those who choose D = 0. There will be important differences between treatment and control groups' preferences not accounted for by covariates in the systemic component V(x) and those unobserved preferences in 'e' will be dependent on potential outcomes Y0 and Y1 just like Scott was saying. I don't think we can assume in an observational setting with treatment selection that the random component of the RUM is really random with regard to the choice of taking treatment if the choice is driven by expected potential outcomes. 

Some Final Questions

If 'x' captures everything relevant to an individual's assessment of their potential outcomes Y1 and Y0 (and we have all the data for 'x' which itself is a questionable assumption) then could we claim that everything else captured by the term 'e' is due to random noise - maybe pattern noise or occasion noise

In an observational setting where we are modeling treatment choice D, can we break 'e' down further into components like below?

e = e1 + e2

where e1 is unobservable heterogeneity in rational preferences driven by potential outcomes Y1 & Y0 making it non random and e2 represents noise that is more random like pattern or occasion noise and likely to be independent of Y1 & Y0. 

IF the answer to the questions above is YES and we can decompose the random component of RUMS this way and e2 makes up the largest component of e (i.e  e1 is small, non-existent, or insignificant),  then maybe a RUM is a valid way to think about modeling the decision to choose treatment D and we can match on the attributes of systemic utility 'x' and appeal to the CIA (if my understanding is correct).

But the less we actually know about x and what is driving the decision as it relates to potential outcomes Y0 and Y1, the larger e1 becomes and then the random component of a RUM may no longer be random. 

If my understanding above is correct, then the things we likely would have to assume for a RUM to be valid turn out to be similar to if not exactly the things we need for the CIA to hold. 

The possibility of meeting the assumptions of a RUM or the CIA would seem unlikely in observational settings if (1) we don't know a lot about systemic utility and 'x' and (2) the random component  e turns out not to be random. 

Conclusion

So much for an applied guy trying to do theory to support the possibility of the CIA holding in matched analysis.  I should say I am not an evangelist for matching but trying to be more of a realist about its uses and validity.  Scott's post introduces a very interesting way to think about matching and the CIA and the challenges we might have meeting the conditions for it.