tag:blogger.com,1999:blog-24744983008595938072024-03-17T23:02:33.473-04:00Econometric SenseAn attempt to make sense of econometrics, biostatistics, machine learning, experimental design, bioinformatics, ....Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.comBlogger319125tag:blogger.com,1999:blog-2474498300859593807.post-89870906282287544822023-03-02T10:52:00.014-05:002023-06-10T13:38:41.192-04:00Are Matching Estimators and the Conditional Independence Assumption Inconsistent with Rational Decision Making<p> Scott Cunningham brings up some interesting points about <a href="http://econometricsense.blogspot.com/2015/03/using-r-matchit-package-for-propensity.html" target="_blank">matching</a> and utility maximization in this substack post: <a href="https://causalinf.substack.com/p/why-do-economists-so-dislike-conditional">https://causalinf.substack.com/p/why-do-economists-so-dislike-conditional</a> </p><p><i>"Because most of the time, when you are fully committed to the notion that people are rational, or at least intentionally pursuing goals and living in the reality of scarcity itself, you actually think they are paying attention to those potential outcomes. Why? Because those potential outcomes represent the gains from the choice you’re making....if you think people make choices because they hope the choice will improve their life, then you believe their choices are directly dependent on Y0 and Y1. This is called “selection on treatment gains”, and it’s a tragic problem that if true almost certainly means covariate adjustment won’t work....Put differently, conditional independence essentially says that for a group of people with the same covariate values, their decision making had become erratic and random. In other words, the covariates contained the rationality and you had found the covariates that sucked that rationality out of their minds."</i></p><p>This makes me want to ask - is there a way I can specify utility functions or think about utility maximization that is consistent with the CIA in a matching scenario? This gets me into very dangerous territory because my background is applied economics, not theory. I think most of the time when matching is being used in observational settings, people aren't thinking about utility functions and consumer preferences and how they relate to potential outcomes. Especially non-economists. </p><p><b>Thinking About Random Utility Models</b></p><p>The discussion above for some reason motivated me to think about random utility models (RUMs). Not being a theory person and not having worked with RUMs hardly at all, I'm being even more dangerous but hear me out, this is just a thought experiment. </p><p>I first heard of RUMs years ago when working in market research and building models focused on student enrollment decisions. From what I understand they are an important work horse in discrete choice modeling applications. Food economist Jayson Lusk has even looked at RUMs and their predictive validity via functional magnetic resonance imaging (see <a href="https://www.sciencedirect.com/science/article/abs/pii/S0167268116302098" target="_blank">Neural Antecedents of a Random Utility Model</a>).</p><p>The equation below represents the basic components of a random utility model:</p><p><span style="font-family: courier;">U = V + e</span></p><p>where <span style="font-family: courier;">V </span>= systemic utility and <span style="font-family: courier;">'e'</span> represents random utility. </p><p>Consumers choose the option that provides the greatest utility. The systemic component <span style="font-family: courier;">'V' </span>captures attributes describing the alternative choices or perceptions about the choices, and characteristics of the decision maker. In the cases where matching methods are used in observational settings, the relevant choice is often whether or not to participate in a program or take treatment.</p><p>This seems to speak to one of the challenges raised in Scott's post (keep in mind Scott never mentions RUMS, all this about RUMS are my meandering so if a discussion about RUMs is non-sensical its on me not him): </p><p><i>"The known part requires a model, be it formal or informal in nature, and the quantified means it’s measured and in your dataset. So if you have the known and quantified confounder, then a whole host of solutions avail themselves to you like regression, matching, propensity scores, etc....There’s a group of economists who object to this statement, and usually it’s that “known” part."</i></p><p>What seems appealing to me is that RUMs appear to allow us to make use of what we think we can know about utility via<span style="font-family: courier;"> 'V' </span>and still admit that there is a lot we don't know, captured by <span style="font-family: courier;">'e' </span>in a random utility model. In this formulation <span style="font-family: courier;">'e' </span>still represents rationality, it's just unobservable heterogeneity in rational preferences that we can't observe. This is assumed to be random. Many economists working in discrete choice modeling contexts are apparently comfortable with the 'known' part of a RUM at least from the way I understand this.</p><p><b>A Thought Experiment: A Random Utility Model for Treatment Participation</b></p><p>Again - proceeding cautiously here, suppose that in an observational setting the decision to engage in a program or treatment <span style="font-family: courier;">D </span>designed to improve outcome <span style="font-family: courier;">Y</span> is driven by systematic and random components in a RUM:</p><p><span style="font-family: courier;">U = V(x) + e</span></p><p>and the decision to participate is based on as Scott describes the <a href="http://econometricsense.blogspot.com/2013/05/selection-bias-and-rubin-causal-model.html" target="_blank">potential outcomes</a> <span style="font-family: courier;">Y1 </span>and <span style="font-family: courier;">Y0</span> which represent the gains from choosing. </p><p><span style="font-family: courier;">delta = (Y1 - Y0) </span>where you get <span style="font-family: courier;">Y1</span> for choosing <span style="font-family: courier;">D=1</span> and <span style="font-family: courier;">Y0</span> for <span style="font-family: courier;">D=0</span></p><p>In the RUM you choose <span style="font-family: courier;">D = 1</span> if <span style="font-family: courier;">U(D = 1) > U(D = 0)</span><span style="font-family: inherit;"> </span></p><p><span style="font-family: courier;">D = f(delta) = f(Y1,Y0)= f(x)</span></p><p>and we specify the RUM as <span style="font-family: courier;">U(D) = V(x) + e</span></p><p><span style="font-family: inherit;">where </span><span style="font-family: courier;">x</span><span style="font-family: inherit;"> represents all the observable things that might contribute to an individual's utility (</span>perceptions about the choices, and characteristics of the decision maker) <span style="font-family: inherit;">in relation to making this decision. </span></p><p>So the way I wanted to think about this is when we are matching, the factors we match/control for would be the observable variables <span style="font-family: courier;">'x' </span>that contribute to systemic utility <span style="font-family: courier;">V(x)</span>, while many of the unobservable aspects reflect heterogeneous preferences across individuals that we can't observe. This would contribute to the random component of the RUM. </p><p>So in essence YES, if we think about this in the context of a RUM, the covariates contain all of the rationality (at least the observable parts) and what is unobserved can be modeled as random. We've harmonized utility maximization, matching and the CIA! </p><p><b>Meeting the Assumptions of Random Utility and the CIA</b></p><p>But wait...not so fast. In the observational studies where matching is deployed, I am not sure we can assume the unobserved heterogeneous preferences represented by <span style="font-family: courier;">'e'</span> will be random across the groups we are comparing. Those who choose <span style="font-family: courier;">D =1 </span>will have obvious differences in preferences than those who choose <span style="font-family: courier;">D = 0.</span> There will be important differences between treatment and control groups' preferences not accounted for by covariates in the systemic component <span style="font-family: courier;">V(x) </span>and those unobserved preferences in <span style="font-family: courier;">'e'</span> will be dependent on potential outcomes <span style="font-family: courier;">Y0 </span>and <span style="font-family: courier;">Y1</span> just like Scott was saying. I don't think we can assume in an observational setting with treatment selection that the random component of the RUM is really random with regard to the choice of taking treatment if the choice is driven by expected potential outcomes. </p><p><b>Some Final Questions</b></p><p>If <span style="font-family: courier;">'x'</span> captures <i><u>everything</u></i> relevant to an individual's assessment of their potential outcomes <span style="font-family: courier;">Y1</span> and <span style="font-family: courier;">Y0</span> (and we have all the data for <span style="font-family: courier;">'x' </span>which itself is a questionable assumption) then could we claim that everything else captured by the term <span style="font-family: courier;">'e' </span>is due to random noise - maybe <a href="https://en.wikipedia.org/wiki/Noise:_A_Flaw_in_Human_Judgment" target="_blank">pattern noise or occasion noise</a>? </p><p>In an observational setting where we are modeling treatment choice <span style="font-family: courier;">D</span>, can we break <span style="font-family: courier;">'e' </span>down further into components like below?</p><p><span style="font-family: courier;">e = e<span style="font-size: xx-small;">1</span> + e<span style="font-size: xx-small;">2</span></span></p><p>where <span style="font-family: courier;">e<span style="font-size: xx-small;">1</span> </span>is unobservable heterogeneity in rational preferences driven by potential outcomes <span style="font-family: courier;">Y1 </span>& <span style="font-family: courier;">Y0</span> making it non random and <span style="font-family: courier;">e<span style="font-size: xx-small;">2</span></span> represents noise that is more random like pattern or occasion noise and likely to be independent of <span style="font-family: courier;">Y1</span> & <span style="font-family: courier;">Y0.</span> </p><p>IF the answer to the questions above is YES and we can decompose the random component of RUMS this way and <span style="font-family: courier;">e<span style="font-size: xx-small;">2</span> </span>makes up the largest component of <span style="font-family: courier;">e </span><span style="font-family: inherit;">(i.e </span><span style="font-family: courier;">e<span style="font-size: xx-small;">1</span></span><span style="font-family: inherit;"> is small, non-existent, or insignificant), </span>then maybe a RUM is a valid way to think about modeling the decision to choose treatment <span style="font-family: courier;">D </span>and we can match on the attributes of systemic utility <span style="font-family: courier;">'x'</span> and appeal to the CIA (if my understanding is correct).</p><p>But the less we actually know about <span style="font-family: courier;">x </span>and what is driving the decision as it relates to potential outcomes <span style="font-family: courier;">Y0</span> and <span style="font-family: courier;">Y1,</span> the larger <span style="font-family: courier;">e<span style="font-size: xx-small;">1</span></span> becomes and then the random component of a RUM may no longer be random. </p><p>If my understanding above is correct, then the things we likely would have to assume for a RUM to be valid turn out to be similar to if not exactly the things we need for the CIA to hold. </p><p>The possibility of meeting the assumptions of a RUM or the CIA would seem unlikely in observational settings if (1) we don't know a lot about systemic utility and <span style="font-family: courier;">'x'</span> and (2) the random component <span style="font-family: courier;">e</span> turns out not to be random. </p><p><b>Conclusion</b></p><p>So much for an applied guy trying to do theory to support the possibility of the CIA holding in matched analysis. I should say I am not an evangelist for matching but trying to be more of a realist about its uses and validity. Scott's post introduces a very interesting way to think about matching and the CIA and the challenges we might have meeting the conditions for it. </p><p><br /></p>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-41662083407190617432023-01-26T13:13:00.002-05:002023-03-02T10:10:41.575-05:00What is new and different about difference-in-differences?<p>Back in 2012 I wrote about the <a href="https://econometricsense.blogspot.com/2012/12/difference-in-difference-estimators.html" target="_blank">basic 2 x 2 difference in difference analysis </a>(two groups, two time periods). Columbia public health probably has a <a href="https://www.publichealth.columbia.edu/research/population-health-methods/difference-difference-estimation" target="_blank">better introduction. </a></p><p>The most famous example of an analysis that motivates a 2 x 2 DID analysis is <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8006863/#:~:text=The%20difference%2Din%2Ddifference%20method%20is%20one%20of%20the%20oldest,relatively%20infrequently%20by%20epidemiologists%20today." target="_blank">John Snow's 1855 analysis of the cholera epidemic in London</a>:</p><p><br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgkxnuNzVSBgutgWIIDNAjjq7KfDCh9vo7fXKOse-uaNvTbXH3Q3i6cDUxcyGvSNRXIfryQqyVtqlEzCTllw4PqPjgECfNfkS2iS_78LOdAGdVbCviaPiVyhLoppZU_QbQR_FMtD0zhCGpf771hH16tMpxgsfZXUv_7gOw1VTesrMzK8H1wvHdUIq7wwQ" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="652" data-original-width="716" height="286" src="https://blogger.googleusercontent.com/img/a/AVvXsEgkxnuNzVSBgutgWIIDNAjjq7KfDCh9vo7fXKOse-uaNvTbXH3Q3i6cDUxcyGvSNRXIfryQqyVtqlEzCTllw4PqPjgECfNfkS2iS_78LOdAGdVbCviaPiVyhLoppZU_QbQR_FMtD0zhCGpf771hH16tMpxgsfZXUv_7gOw1VTesrMzK8H1wvHdUIq7wwQ=w412-h286" width="412" /></a></div><br /><br /><p></p><p>(Image <a href="https://www.youtube.com/watch?v=m1xSMNTKoMs&t=158s" target="_blank">Source</a>)</p><p> I have since written about some of the challenges of estimating DID with glm models (see <a href="https://econometricsense.blogspot.com/2016/03/whats-difference-between-difference-in.html" target="_blank">here</a>, <a href="https://econometricsense.blogspot.com/2016/03/identification-and-common-trend.html" target="_blank">here,</a> and <a href="https://econometricsense.blogspot.com/2019/01/modeling-claims-with-linear-vs-non.html" target="_blank">here.)</a>, as well as <a href="http://econometricsense.blogspot.com/2015/09/propensity-score-matching-meets.html" target="_blank">combining DID with matching</a>, and <a href="https://econometricsense.blogspot.com/2019/02/was-it-meant-to-be-or-sometimes-playing.html" target="_blank">problems to watch out for when combining methods.</a> But a lot of what we know about difference in differences has changed in the last decade. I'll try to give a brief summary based on my understanding and point towards some references that do a better job presenting the current state.</p><p><b>The Two-Way Fixed Effects model (TWFE)</b></p><p>The first thing I should discuss is extending the 2x2 model to include multiple treated groups and/or multiple time periods. The generalized model for DiD also referred to as the two-way fixed effects (TWFE) model is the best way to represent those kind of scenarios:.</p><p>Y<span style="font-size: xx-small;">gt </span>= a<span style="font-size: xx-small;">g </span>+ b<span style="font-size: xx-small;">t </span>+ δD<span style="font-size: xx-small;">gt </span>+ ε<span style="font-size: xx-small;">t</span></p><p>a<span style="font-size: xx-small;">g </span>= group fixed effects</p><p>b<span style="font-size: xx-small;">t </span>= time fixed effects</p><p>D<span style="font-size: xx-small;">gt</span>= treatment*post period (interaction term)</p><p>δ = ATT or DID estimate</p><p>Getting the correct standard errors for DID models that involve many repeated measures over time and/or where treatment and control groups are defined by multiple geographies presents two challenges compared to the basic 2x2 model. Serial correlation and correlation within groups. There are several approaches that can be considered depending on your situation.</p><p>1 - Block bootstrapping</p><p>2 - Aggregating data into single pre and post periods</p><p>3 - Clustering standard errors at the group level</p><p>Clustering at the group level should provide the appropriate standard errors in these situations when the number of clusters are large.</p><p>For more details on TWFE models, both Scott Cunningham and Nick Huntington-Klein have great econometrics textbooks with chapters devoted to these topics. See the references below for more info.</p><p><b>Differential Timing and Staggered Rollouts</b></p><p>But things can get even more complicated with DID designs. Think about situations where there are different groups getting treated at different times over a number of time periods. This is not just a thought experiment trying to imagine the most difficult study design and pondering for the sake of pondering – these kind of staggered rollouts are very common in business and policy settings. Imagine policy rules adopted by different states over time (like changes in minimum wages) or imagine testing a new product or service by rolling it out to different markets over time. Understanding how to evaluate their impact is important. For a while it seemed economists may have been a little guilty of handwaving with the TWFE model assuming the estimated treatment coefficient was giving them the effect they wanted. </p><p>But Andrew Goodman-Bacon refused to take this interpretation at face value and broke this down for us determining that the TWFE estimator was trying to give us a weighted average of all potential 2x2 DID estimates you could make with the data. That actually sounds intuitive and helpful. But what he discovered that is not so intuitive is that some of those 2x2 comparisons could be comparing previously treated groups with current treated groups. That's not a comparison we generally are interested in making, but it gets averaged in with the others and can drastically bias the results particularly when there is treatment effect heterogeneity (the treatment effect is different across groups and trending over time). </p><p>So how do you get a better DID estimate in this situation? I'll spare you the details (because I'm still wrestling with them) but the answer seems to be the estimation strategy developed by Callaway and Sant'Anna. The <a href="https://cran.r-project.org/web/packages/did/vignettes/TWFE.html" target="_blank">documentation in R </a>for their package walks through a lot of the details and challenges with TWFE models with differential timing. </p><p>Additionally this video of Andrew Goodman-Bacon was really helpful for understanding the 'Bacon' decomposition of TWFE models and the problems above.</p><p><br /><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/m1xSMNTKoMs" title="YouTube video player" width="560"></iframe></p><p>After watching Goodman-Bacon, I recommend this talk from Sant'Anna discussing their estimator. </p><p> <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/VLviaylakAo" title="YouTube video player" width="560"></iframe> </p><p>Below Nick Huntington-Klein provides a great summary of the issues made apparent by the Bacon decomposition made above and the Callaway and Sant'Anna method for staggered/rollout DID designs. he also gets into the Wooldridge Mundlack approach:</p><p>
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/hu2nDbnpALA" title="YouTube video player" width="560"></iframe>
</p><p><b>A Note About Event Studies</b></p><p>In a number of references I have tried to read to understand this issue, the term 'event study' is thrown around and it seems like every time it is used it is used differently but the author/speaker assumes we are all taking about the same thing. In this video Nick Huntington-Klein introduces event studies in a way that is the most clear and consistent. Watching this video might help.</p><p> <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/_N9u9p-kNgg" title="YouTube video player" width="560"></iframe></p><p><b>References: </b></p><p>Causal Inference: The Mixtape. Scott Cunningham. <a href="https://mixtape.scunning.com/" target="_blank">https://mixtape.scunning.com/ </a></p><p>The Effect: Nick Huntington-Klein. <a href="https://theeffectbook.net/">https://theeffectbook.net/</a></p><p>Andrew Goodman-Bacon. Difference-in-differences with variation in treatment timing. Journal of Econometrics.Volume 225, Issue 2, 2021.</p><p>Brantly Callaway, Pedro H.C. Sant’Anna. Difference-in-Differences with multiple time periods. Journal of Econometrics. Volume 225, Issue 2, 2021,</p><p><b>Related Posts:</b></p><p>Modeling Claims Costs with Difference in Differences. <a href="https://econometricsense.blogspot.com/2019/01/modeling-claims-with-linear-vs-non.html " target="_blank">https://econometricsense.blogspot.com/2019/01/modeling-claims-with-linear-vs-non.html </a></p><p>Was It Meant to Be? OR Sometimes Playing Match Maker Can Be a Bad Idea: Matching with Difference-in-Differences. <a href="https://econometricsense.blogspot.com/2019/02/was-it-meant-to-be-or-sometimes-playing.html " target="_blank">https://econometricsense.blogspot.com/2019/02/was-it-meant-to-be-or-sometimes-playing.html </a></p><p><br /></p>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com1tag:blogger.com,1999:blog-2474498300859593807.post-41909063074619059502022-10-29T19:34:00.046-04:002023-01-03T21:46:32.254-05:00The Value of Experimentation and Causal Inference in Complex Business Environments<p><span></span></p><h1 style="clear: both; text-align: center;">Introduction</h1><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgL2QUeWmdZAIY6BQ5-5lt9Hc06XiHqpz8DkNxvIX6EHbAb27BFTsT9msdzwQQCzBIwrqk4aMRCifScb9rCN2TMjzlT3fAbtNB2Fi_6Zd4AlE4DMI4z0tFV7CuwMvMxuoq-2ei209bedBhSJZ4wiwqGDdRJHmFH5OlLxE79gyIwjp7nJqEzHWXHD-ufrA/s567/Correlation%20vs%20Causation.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="460" data-original-width="567" height="260" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgL2QUeWmdZAIY6BQ5-5lt9Hc06XiHqpz8DkNxvIX6EHbAb27BFTsT9msdzwQQCzBIwrqk4aMRCifScb9rCN2TMjzlT3fAbtNB2Fi_6Zd4AlE4DMI4z0tFV7CuwMvMxuoq-2ei209bedBhSJZ4wiwqGDdRJHmFH5OlLxE79gyIwjp7nJqEzHWXHD-ufrA/s320/Correlation%20vs%20Causation.jpg" width="320" /></a></span></div><span><br /></span><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px; text-align: left;"><span><i>Summary: </i></span><i>Causality in business means understanding how to connect the things we do with the value we create. </i><i>A cause is something that makes a difference (Dave Lewis, Journal of Philosophy, 1973). If we are interested in what makes a difference in creating business value (what makes a difference in moving the truck above), we care about causality. Causal inference in business helps us create value by providing knowledge about what makes a difference so we can move resources from a lower valued use (having folks on the back of the truck) to a higher valued use (putting folks behind the truck). </i></blockquote></blockquote><p></p><p><br /></p><p>We might hear the phrase c<i>orrelation is not causation </i>so often that it could easily be dismissed as a cliche, as opposed to a powerful mantra for improving knowledge and decision making. These distinctions have an important meaning in business and applied settings. We could think of businesses as collections of decisions and processes that move and transform resources. Business value is created by moving resources from lower to higher valued uses. Knowledge is the most important resource in a firm and the essence of organizational capability, innovation, value creation, and competitive advantage. Causal knowledge is no exception. <a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html" target="_blank">Part 1</a> of this series discusses the knowledge problem and decisions.</p><p>In business talk can be cheap. With lots of data anyone can tell a story to support any decision they want to make. But good decision science requires more than just having data and a good story, it's about having evidence to support decisions so we can learn faster and fail smarter. In the diagram above this means being able to identify a resource allocation that helps us push the truck forward (getting people behind the truck). Confusing correlation with causation might lead us to believe value is a matter of changing shirt colors vs. moving people. We don't want to be weeks, months, or years down the road only to realize that other things are driving outcomes, not the thing we've been investing in. By that time, our competition is too far ahead for us to ever to catch up and it may be too late for us to make up for the losses of misspent resources. This is why in business, we want to invest in causes, not correlations. We are ultimately going to learn either way, the question is about if we'd rather do it faster and methodically, or slower and precariously. </p><p>How does this work? You might look at the diagram above and tell yourself - it's common sense where you need to stand to push the truck to move it forward - I don't need any complicated analysis or complex theories to tell me that. That's true for a simple scenario like that and likely so for many day to day operational decisions. Sometimes common sense or subject matter expertise can provide us with sufficient causal knowledge to know what actions to take. But when it comes to informing the tactical implementation of strategy (discussed in <a href="http://econometricsense.blogspot.com/2020/05/the-value-of-business-experiments-part.html" target="_blank">part 3</a> of this series) we can't always make that assumption. In complex business environments with high causal density (where the number of things influencing outcomes is numerous), we usually don't know enough about the nature and causes of human behavior, decisions, and causal paths from actions to outcomes to account for them well enough to know - what should I do? What creates value? In complicated business environments intuition alone may not be enough - as I discuss in <a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html" target="_blank">part 2 </a>of this series we can be easily fooled by our own biases and biases in the data and the many stories that it could tell. </p><p>From his experience with Microsoft, Ron Kohavi shares, up to <a href="https://exp-platform.com/experiments-at-microsoft/" target="_blank">2/3 of the ideas we might test in a business environment</a> turn out to either have flat results or harm the metric we are trying to improve. <a href="https://readnoise.com/" target="_blank">In Noise: A Flaw in Human Judgement </a>authors share how often experts disagree with each other and even themselves at different times because of biases in judgement and decision making. As <a href="https://www.amazon.com/Designing-Behavior-Change-Psychology-Behavioral/dp/1449367623" target="_blank">Stephen Wendel</a> says you can't just wing it with bar charts and graphs when you need to know <i>what makes a difference. </i></p><p>In application, experimentation and casual inference represents <a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html" target="_blank">a way of thinking</a> that requires careful consideration of the business problem and all the ways that our data can fool us; separating signal from noise (statistical inference) and making the connection between actions and outcomes (causal inference). Experimentation and causal inference leverages good decision science that brings together theory and subject matter expertise with data so we can make better informed business decisions in the face of our own biases and the biases in data. In the series of posts that follow, I overview in more detail the ways that experimentation and causal inference help us do these things in complex business environments. </p><p><b>The Value of Experimentation and Causal Inference in Complex Business Environments:</b></p><p><a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html" target="_blank">Part 1: The Knowledge Problem</a></p><p><a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html" target="_blank">Part 2: Behavioral Biases</a></p><p><a href="http://econometricsense.blogspot.com/2020/05/the-value-of-business-experiments-part.html" target="_blank">Part 3: Strategy and Tactics</a></p><p><br /></p>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-27521169347423185942021-11-04T13:18:00.007-04:002021-11-04T19:35:54.661-04:00Causal Decision Making with non-Causal Models<p>In a <a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html" target="_blank">previous post</a> I noted: </p><p><i>" ...correlations or 'flags' from big data might not 'identify' causal effects, but they are useful for prediction and might point us in directions where we can more rigorously investigate causal relationships"</i></p><p><a href="https://www.linkedin.com/pulse/beyond-shap-values-crystal-balls-matt-bogard/?trackingId=kQObfqj0SnKnx4SHAc1H1A%3D%3D" target="_blank">Recently on LinkedIn </a>I discussed situations where we have to be careful about taking action on specific features in a correlational model, for instance changing product attributes or designing an intervention based on interpretations of SHAP values from non-causal predictive models. I quoted Scott Lundberg:</p><p><i>"regularized machine learning models like XGBoost will tend to build the most parsimonious models that predict the best with the fewest features necessary (which is often something we strive for). This property often leads them to select features that are surrogates for multiple causal drivers which is "very useful for generating robust predictions...but not good for understanding which features we should manipulate to increase retention."</i></p><p>So sometimes, we may go into a project with the intention of only needing predictions. We might just want to target offers or nudges to customers or product users but not think about this in causal terms at first. But, as I have discussed before the conversation often inevitably turns to causality, even if stakeholders and business users don't use causal language to describe their problems. </p><p><i>"Once armed with predictions, businesses will start to ask questions about 'why'... they will want to know what decisions or factors are moving the needle on revenue or customer satisfaction and engagement or improved efficiencies...There is a significant difference between understanding what drivers correlate with or 'predict' the outcome of interest and what is actually driving the outcome."</i></p><p>This would seem to call for causal models. However, in their recent paper Carlos Fernández-Loría and Foster Provost make an exciting claim:</p><p><i>“what might traditionally be considered “good” estimates of causal effects are not necessary to make good causal decisions…implications above are quite important in practice, because acquiring data to estimate causal effects accurately is often complicated and expensive. Empirically, we see that results can be considerably better when modeling intervention decisions rather than causal effects.”</i></p><p>Now in this case they are not talking about causal models related to identifying key drivers of an outcome, so it is not contradicting anything mentioned above or in previous posts. Particularly they are talking about building models for causal decision making (CDM) that are simply focused on making decisions about who to 'treat' or target. In this particular scenario businesses are leveraging predictive models to target offers, provide incentives, or make recommendations. As discussed in the paper, there are two broad ways of approaching this problem. Let's say the problem is related to churn.</p><p>1) We could predict risk of churn and target members most likely to churn. We could do this with a purely correlational machine learning model. The output or estimand from this model is a predicted probability p() or risk score. They also refer to these kinds of models as 'outcome' models</p><p>2) We could build a causal model, that predicts causal impact of an outreach. This would allow us to target customers that we can most likely 'save' as a result of our intervention. They refer to this estimand as a causal effect estimate CEE. Building machine learning models that are causal can be more challenging and resource intensive.</p><p>It is true at the end of the day we want to maximize our impact. But the causal decision is ultimately who do we target in order to maximize impact. They point out this causal decision does not necessarily hinge on how accurate our point estimate is related to causal impact as long as errors in prediction still lead to the same decisions about who to target.</p><p>What they find is that in order to make good causal decisions about who to 'treat' we don't have to have super accurate estimates of the causal impact of treatment (or models focused on CEE). In fact they talk through scenarios and conditions where outcome models like #1 above that are non-causal, can perform just as well or sometimes better than more accurate causal models focusing on CEE. </p><p>In other words, correlational outcome models (like #1) can essentially serve as proxies for the more complicated causal models (like #2), even if the data used to estimate these 'proxy' models is confounded.</p><p> Scenarios where this is most likely include:</p><p>1) Outcomes used as proxies and (causal)effects are correlated</p><p>2) Outcomes used as proxies are easier to estimate than causal effects</p><p>3) Predictions are used to rank individuals</p><p>They also give some reasons why this may be true. Biased non-causal models built on confounded data may not be able to identify true causal effects, but still be useful for identifying the optimal decision. </p><p><i>"This could occur when confounding is stronger for individuals with large effects - for example if confounding bias is stronger for 'likely' buyers, but the effect of adds is also stronger for them...the key insight here is that optimizing to make the correct decision generally involves understanding whether a causal effect is above or below a given threshold, which is different from optimizing to reduce the magnitude of bias in a causal effect estimate."</i></p><p><i>"Models trained with confounded data may lead to decisions that are as good (or better) than the decisions made with models trained with costly experimental data, in particular when larger causal effects are more likely to be overestimated or when variance reduction benefits of more and cheaper data outweigh the detrimental effect of confounding....issues that make it impossible to estimate causal effects accurately do not necessarily keep us from using the data to make accurate intervention decisions."</i></p><p>Their arguments hinge on the idea that what we are really solving for in these decisions is based on ranking:</p><p><i>"Assuming...the selection mechanism producing the confounding is a function of the causal effect - so that the larger the causal effect the stronger the selection-then (intuitively) the ranking of the preferred treatment alternatives should be preserved in the confounded setting, allowing for optimal treatment assignment policies from data."</i></p><p>A lot of this really comes down to proper problem framing and appealing to the popular paraphrasing of George E. P. Box - all models are wrong, but some are useful. It turns out in this particular use case non-causal models can be as useful or more useful than causal ones.</p><p>And we do need to be careful about the nuance of the problem framing. As the authors point out, this solves one particular business problem and use case, but does not answer some of the most important causal questions businesses may be interested in:</p><p><i>"This does not imply that firms should stop investing in randomized experiments or that causal effect estimation is not relevant for decision making. The argument here is that causal effect estimation is not necessary for doing effective treatment assignment."</i></p><p>They go on to argue that randomized tests and other causal methods are still core to understanding the effectiveness of interventions and strategies for improving effectiveness. Their use case begins and ends with what is just one step in the entire lifecycle of product development, deployment, and optimization. In their discussion of further work they suggest that:</p><p><i>"Decision makers could focus on running randomized experiments in parts of the feature space where confounding is particularly hurtful for decision making, resulting in higher returns on their experimentation budget."</i></p><p>This essentially parallels my previous discussion related to SHAP values. For a great reference for making practical business decisions about when this is worth the effort see the HBR article in the references discussing when to act on a correlation.</p><p>So some big takeaways are:</p><p>1) When building a model for purposes of causal decision making (CDM) even a biased model (non-causal) can perform as well or better than a causal model focused on CEE.</p><p>2) In many cases, even a predictive model that provides predicted probabilities or risk (as proxies for causal impact or CEE) can perform as well or better than causal models when the goal is CDM.</p><p>3) If the goal is to take action based on important features (i.e. SHAP values as discussed before) however, we still need to apply a causal framework and understanding the actual effectiveness of interventions may still require randomized tests or other methods of causal inference.</p><div>HT: This paper was previously discussed at Andrew Gelman's blog here: <a href="https://statmodeling.stat.columbia.edu/2021/11/01/how-different-are-causal-estimation-and-decision-making/" target="_blank">https://statmodeling.stat.columbia.edu/2021/11/01/how-different-are-causal-estimation-and-decision-making/ </a></div><div><b><br /></b></div><div><b>References: </b></div><p>Causal Decision Making and Causal Effect Estimation Are Not the Same... and Why It Matters. Carlos Fernández-Loría and Foster Provost. 2021. https://arxiv.org/abs/2104.04103</p><p>When to Act on a Correlation, and When Not To. David Ritter. Harvard Business Review. March 19, 2014. </p><p>Be Careful When Interpreting Predictive Models in Search of Causal Insights. Scott Lundberg. <a href="https://towardsdatascience.com/be-careful-when-interpreting-predictive-models-in-search-of-causal-insights-e68626e664b6 " target="_blank">https://towardsdatascience.com/be-careful-when-interpreting-predictive-models-in-search-of-causal-insights-e68626e664b6 </a> </p><p><b>Additional Reading:</b></p><p>Laura B Balzer, Maya L Petersen, Invited Commentary: Machine Learning in Causal Inference—How Do I Love Thee? Let Me Count the Ways, American Journal of Epidemiology, Volume 190, Issue 8, August 2021, Pages 1483–1487, https://doi.org/10.1093/aje/kwab048</p><p>Petersen, M. L., & van der Laan, M. J. (2014). Causal models and learning from data: integrating causal modeling and statistical estimation. Epidemiology (Cambridge, Mass.), 25(3), 418–426. https://doi.org/10.1097/EDE.0000000000000078</p><p>Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning. Numair Sani, Daniel Malinsky, Ilya Shpitser arXiv:2006.02482v3 </p><p><b>Related Posts:</b></p><p><a href="https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html " target="_blank">Will there be a credibility revolution in data science and AI? </a></p><p><a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html " target="_blank">Statistics is a Way of Thinking, Not a Toolbox</a></p><p><a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html " target="_blank">Big Data: Don't Throw the Baby Out with the Bathwater</a></p><p><a href="http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html" target="_blank">Big Data: Causality and Local Expertise Are Key in Agronomic Applications </a></p><p><a href="https://www.linkedin.com/pulse/use-knowledge-big-data-society-matt-bogard/ " target="_blank">The Use of Knowledge in a Big Data Society</a></p>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-12857994209295935162021-07-07T17:51:00.011-04:002021-10-06T19:10:11.059-04:00R.A Fisher, Big Data, and Pretended Knowledge<p>In Thinking Fast and Slow, Kahneman points out that what matters more than the quality of evidence, is the coherence of the story. In business and medicine, he notes that this kind of 'pretended' knowledge based on coherence is often sought and preferred. We all know that no matter how great the analysis, <a href="http://econometricsense.blogspot.com/2021/06/science-communication-for-business-and.html" target="_blank">if we can't explain and communicate the results with influence</a>, our findings may go unappreciated. But as we have learned from <a href="http://ageconomist.blogspot.com/2021/04/consumer-perceptions-misinformation-and.html" target="_blank">misinformation and disinformation about everything from vaccines to GMOs,</a> Kahneman's insight is a double edge sword that cuts both ways. Coherent stories often win out over solid evidence and lead to making the wrong decision. We see this not only in science and politics, but also in business. </p><p>In the book <a href="https://en.wikipedia.org/wiki/The_Lady_Tasting_Tea" target="_blank">The Lady Tasting Tea</a> by David Salsburg, we learn that R.A. Fisher was all too familiar with the pitfalls of attempting to innovate based on pretended knowledge and big data (excerpts):</p><p><i>"The Rothamsted Agricultural Experiment Station, where Fisher worked during the early years of the 20th century, had been experimenting with different fertilizer components for almost 90 years before he arrived...for 90 years the station ran experiments testing different combinations of mineral salts and different strains of wheat, rye, barley, and potatoes. This had created a huge storehouse of data, exact daily records of rainfall and temperature, weekly records of fertilizer dressings and measures of soil, and annual records of harvests - all of it preserved in leather bound notebooks. Most of the 'experiments' had not produced consistent results, but the notebooks had been carefully stored away in the stations archives....the result of these 90 years of 'experimentation' was a mess of confusion and vast troves of unpublished and useless data...the most that could be said of these [experiments] was that some of them worked sometimes, perhaps, or maybe."</i></p><p>Fisher introduced the world to experimental design and challenged the idea that scientists could make progress by tinkering alone. Instead, he motivated them to think through inferential questions- Is the difference in yield for variety A vs variety B (signal) due to superior genetics, or is this difference what we would expect to see anyway due to natural variation in crop yields (noise) i.e. in other words is the differences in yield statistically significant? This is the original intention of the concept of statistical significance that has gotten lost in the many abuses and misinterpretations often hear about. He also taught us to ask questions about causality, does variety A actually yield better than variety B because it is genetically superior, or could differences in yield be explained by differences in soil characteristics, weather/rainfall, planting date, or numerous other factors. His methods taught us how to separate the impact of a product or innovation from the impact and influences of other factors.</p><p>Fisher did more than provide a set of tools for problem solving. <a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html" target="_blank">He introduced a structured way of thinking about real world problems and the data we have to solve them. </a> This way of thinking moved the agronomists at Rothamsted away from from mere observation to useful information. This applies not only to agriculture, but to all of the applied and social sciences as well as business.</p><p>In his book Uncontrolled, Jim Manzi stressed the importance of thinking like Fisher's plant breeders and agronomists (Fisher himself was a geneticist) especially in business settings. Manzi describes the concept of 'high causal density' which is the idea that often the number of causes of variation in outcomes can be enormous with each having the potential to wash out the cause we are most interested in (whatever treatment or intervention we are studying). In business, which is a social science, this becomes more challenging than in the physical and life sciences. In physics and biology we can assume relatively uniform physical and biological laws that hold across space and time. But in business the 'long chain of causation between action and outcome' is 'highly dependent for its effects on the social context in which it is executed.' It's all another way of saying that what happens in the outside world can often have a much larger impact on our outcome than a specific business decision, product, or intervention. As a result this calls for the same approach Fisher advocated in agriculture to be applied in business settings. </p><p>List and Gneezy address this in <a href="https://www.amazon.com/Why-Axis-Undiscovered-Economics-Everyday-ebook/dp/B00BVTSBVO" target="_blank">The Why Axis:</a></p><p><i>"Many businesses experiment and often...businesses always tinker...and try new things...the problem is that businesses rarely conduct experiments that allow a comparison between a treatment and control group...Business experiments are research investigations that give companies the opportunity to get fast and accurate data regarding important decisions."</i></p><p>Fisher's approach soon caught on and revolutionized science and medicine, but in many cases is still lagging adoption in some business settings in the wake of big data, AI, and advances in machine learning. As Jim Manzi and Stefan Thomke state in Harvard Business Review in the absence of formal randomized testing and good experimental design:</p><p><i>"executives end up misinterpreting statistical noise as causation—and end up making bad decisions"</i></p><p>In The Book of Why, Judea Pearl laments the reluctance to embrace causality: </p><p><i>"statistics, including many disciplines that looked to it for guidance remained in the prohibition era, falsely believing that the answers to all scientific questions reside in the data, to be unveiled through clever data mining tricks...much of this data centric history still haunts us today. We live in an era that presumes Big Data to be the solution to all of our problems. Courses in data science are proliferating in our universities, and jobs for data scientists are lucrative in companies that participate in the data economy. But I hope with this book to convince you that data are profoundly dumb...over and over again, in science and business we see situations where more data aren't enough. Most big data enthusiasts, while somewhat aware of those limitations, continue to chase after data centric intelligence."</i></p><p>These big data enthusiasts strike a strong resemblance to the researchers at Rothamsted before Fisher. List has a similar take:</p><p><i>"Big data is important, but it also suffers from big problems. The underlying approach relies heavily on correlations, not causality. As David Brooks has noted, 'A zillion things can correlate with each other depending on how you structure of the data and what you compare....because our work focuses on field experiments to infer causal relationships, and because we think hard about these causal relationships of interest before generating the data we go well beyond what big data could ever deliver."</i></p><p>We often want fast iterations and actionable insights from data. While it is true, a great analysis with no story delivered too late is as good as no analysis, it is just as true that <a href="https://www.blogger.com/blog/post/edit/2474498300859593807/1285799420929593516#" target="_blank">quick insights with a coherent story based on pretended knowledge </a>from big data can leave you running in circles getting nowhere - no matter how fast you might feel like you are running. In the case of Rothamsted, scientists ran in circles for 90 years before real insights could be uncovered using Fisher's more careful and thoughtful analysis. Even if they had today's modern tools of AI and ML and data visualization tools to cut the data 1000 different ways they still would not have been able to get any value for all of their effort. Wow, 90 years! How is that for time to insight? In many ways, despite drowning in data and the advances in AI and machine learning, many areas of business across a number of industries will find themselves in the same place Fisher found himself at Rothamsted almost 100 years ago. We will need a <a href="https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html" target="_blank">credibility revolution in AI </a>to bring about the kind of culture change that will make the kind of causal and inferential thinking that comes natural to today's agronomists (thanks to Fisher) or more recently the way Pearl's disciples think about causal graphs more commonplace in business strategy. </p><p><b>Notes: </b></p><p>1) Randomized tests are not always the only way to make causal inferences. In fact in the Book of Why Pearl notes in relation to smoking and lung cancer, outside of the context of randomized controlled trials <i>"millions of lives were lost or shortened because scientists did not have adequate language or methodology for answering causal questions." </i>The credibility revolution in epidemiology and economics, along with Pearl's work has provided us with this language. As Pearl notes: <i>"Nowadays, thanks to carefully crafted causal models, contemporary scientists can address problems that would have once been considered unsolvable or beyond the pale of scientific inquiry." </i>See also: T<a href="https://econometricsense.blogspot.com/2018/07/the-credibility-revolutions-in.html" target="_blank">he Credibility Revolution(s) in Econometrics and Epidemiology.</a></p><p>2) Deaton and Cartwright make strong arguments challenging the supremacy of randomized tests as the gold standard for causality (similar to Pearl) but this only furthers the importance of considering careful causal questions in business and science by broadening the toolset along the same lines as Pearl. Deaton and Cartwright also emphasize the importance of interpreting causal evidence in the context of sound theory. See: Angus Deaton, Nancy Cartwright,Understanding and misunderstanding randomized controlled trials,Social Science & Medicine, Volume 210, 2018.</p><p>3) None of this is to say that predictive modeling and machine learning cannot answer questions and solve problems that create great value to business. The explosion of the field of data science is an obvious testament to this fact. Probably the most important thing in this regard is for data scientists and data science managers to become familiar with the important distinctions between models and approaches that explain or predict. See also: <a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html" target="_blank">To Explain or Predict</a> and <a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html" target="_blank">Big Data: Don't Throw the Baby Out with the Bathwater</a></p><p><b>Additional Reading</b></p><p>Will there be a credibility revolution in data science and AI? </p><p><a href="https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html">https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html</a> </p><p>Statistics is a Way of Thinking, Not a Toolbox</p><p><a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html">https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html</a> </p><p>The Value of Business Experiments and the Knowledge Problem</p><p><a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html">https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html</a></p><p>The Value of Business Experiments Part 2: A Behavioral Economic Perspective</p><p><a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html " target="_blank">http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html </a></p><p>The Value of Business Experiments Part 3: Innovation, Strategy, and Alignment </p><p><a href="http://econometricsense.blogspot.com/2020/05/the-value-of-business-experiments-part.html " target="_blank">http://econometricsense.blogspot.com/2020/05/the-value-of-business-experiments-part.html </a></p><p>Big Data: Don't Throw the Baby Out with the Bathwater</p><p><a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html">http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html</a> </p><p>Big Data: Causality and Local Expertise Are Key in Agronomic Applications</p><p><a href="http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html">http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html</a></p><p>The Use of Knowledge in a Big Data Society</p><p><a href="https://www.linkedin.com/pulse/use-knowledge-big-data-society-matt-bogard/ " target="_blank">https://www.linkedin.com/pulse/use-knowledge-big-data-society-matt-bogard/ </a></p><div><br /></div><p><br /></p>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-37740130331337038122021-06-02T09:22:00.012-04:002022-10-11T17:57:08.019-04:00Science Communication for Business and Non-Technical Audiences: Stigmas, Strategies, and Tactics<p>If you are a reader of this blog you are familiar with the number of posts I have shared about <a href="http://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html" target="_blank">machine learning and causal inference</a> and the <a href="http://econometricsense.blogspot.com/2010/09/why-study-applied-economics.html" target="_blank">benefits of education in economics</a>. I have also discussed how there are important <a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html" target="_blank">gaps sometimes between theory and application.</a> </p><p>In this post I am going to talk about another important gap related to communication. How do we communicate the value of our work to a non-technical audience? </p><p>We can learn a lot from formal coursework, especially in good applied programs with great professors. But if not careful we can pick up on mental models and habits of thinking that turn out to weigh us down too, particularly for those that end up working in very applied business or policy settings. How we deal with these issues becomes important to career professionals and critical to those involved in science communication in general whether we are trying to influence business decision makers, policy makers, or consumers and voters.</p><p>In this post I want to discuss communicating with intent, paradigm gaps, social harassment costs, and mental accounting.</p><div>As stated in The Analytics Lifecycle Toolkit: <i>"no longer is it sufficient to give the technical answer, we must be able to communicate for both influence and change."</i></div><p><b>Communicating to Business and Non-Technical Audiences - or - </b><b>The Laffer Curve for Science Communication</b></p><p>For those who plan to translate their science backgrounds to business audiences (like many data scientists coming from scientific backgrounds) what are some strategies for becoming better science communicators? In their book <i>Championing Science: Communicating your Ideas to Decision Makers </i>Roger and Amy Aines offer lots of advice. You can listen to a discussion of some of this at the BioReport podcast <a href="https://soundcloud.com/levine-media-group/teaching-scientists-to-communicate" target="_blank">here. </a></p><p>Two important themes they discuss is the idea of paradigm gaps and intent. Scientists can be extremely efficient communicators through the lens of the paradigms they work in. </p><p>As discussed in the podcast, a paradigm is all the knowledge a scientist or economist may have in their head specific to their field of study and research. Unfortunately there is a huge gap between this paradigm and its vocabulary and what non-technical stakeholders can relate to. They have to meet stakeholders where they are, vs. the audience they may find at conferences or research seminars. From experience, different stakeholders and audiences across different industries have different gaps. If you work for a consultancy with external pharma clients they might have a different expectation about statistical rigor than say a product manager in a retail setting. Even within the same business or organization, the tactics used in solving for the gap for one set of stakeholders might not work at all for a new set of stakeholders if you change departments. In other words, know your audience. What do they want or need or expect? What are their biases? What is their level of analytic or scientific literacy? How risk averse are they? Answers to these questions is a great place to start in terms of filling the paradigm gaps and to address the second point made in the podcast - speaking with intent.</p><p>As discussed in the podcast: <i>"many scientists don't approach conversations or presentations with a real strategic intent in terms of what they are communicating...they don't think in terms of having a message....they need to elevate and think about the point they are trying to make when speaking to decision makers." </i></p><p>As Bryan Caplan states in his book The Myth of the Rational Voter, when it comes to speaking to non-economists and the general public, they should apply the Laffer curve of learning, <i>"they will retain less if you try to teach them more."</i></p><p>He goes on to discus that its not just what we say, but how we position it, especially when dealing with resistance related to misinformation and disinformation and systemic biases:</p><p><i>"irrationality is not a barrier to persuasion, but an invitation to alternative rhetorical techniques...if beliefs are in part consumed for their direct psychological benefits then to compete in the marketplace of ideas, you need to bundle them with the right emotional content."</i></p><p>In the <a href="https://geneticliteracyproject.org/2021/05/19/podcast-is-coffee-healthy-or-not-public-health-officials-encourage-vaccine-skepticism-why-childbirth-is-so-hard/" target="_blank">Science Facts and Fallacies podcast</a> (May 19, 2021) Kevin Folta and Cameron English discuss:</p><p><i>"We spend so much time trying to convince people with scientific principles....it's so important for us to remember what we learn from psychology and sociology (and economics) matters. These are turning out to be the most important sciences in terms of forming a conduit through which good science communication can flow."</i></p><p>Torsten Slok offers great advice in his discussion with Barry Ritholtz about working in the private sector as a PhD economist in the <a href="https://ritholtz.com/2018/08/transcript-research-affiliates/" target="_blank">Masters in Business Podcast back in 2018: </a></p><p><i>"there is a different sense of urgency and an emphasis on brevity....we offer a service of having a view on what the economy will do what the markets will do - lots of competition for attention...if you write long winded explanations that say that there is a 50/50 chance that something will happen many customers will not find that very helpful."</i></p><p>So there are a lot of great data science and science communicators out there with great advice. A big problem is this advice is often not part of the training that many of those with scientific or technical backgrounds receive, and an even bigger problem is that it is often looked down upon and even punished! I'll explain more below.</p><p><b>The Negative Stigma of Science Communication in the Data Science and Scientific Community</b></p><p>One of the most egregious things I see on social media is someone trying their best to help mentor those new to the analytical space (and improve their own communication skills) by sharing some post that attempts to describe some complicated statistical concept in 'layman's' terms - to only be rewarded by harassing and trolling comments. Usually this is about how they didn't capture every particular nuance of the theory, failed to include a statement about certain critical assumptions, or over simplified the complex thing they were trying to explain in simple terms to begin with. This kind of negative social harassment seems to be par for the course when attempting to communicate statistics and data science on social media like LinkedIn and Twitter.</p><p>Similarly in science communication, academics can be shunned by their peers when attempting to do popular writing or communication for the general public. </p><p>In 'The Stoic Challenge' author William Irvine discusses Danial Kahneman's challenges with writing a popular book: </p><p><i>"Kahneman was warned that writing a popular book would cause harm to his professional reputation...professors aren't supposed to write books that normal people can understand."</i></p><p>He describes, when Kahneman's book Thinking Fast and Slow made the New York Times best selling list Kahneman <i>"sheepishly explained to his colleagues that the book's appearance there was a mistake."</i></p><p>In an <a href="https://www.econtalk.org/steven-levitt-on-freakonomics-and-the-state-of-economics/#audio-highlights" target="_blank">EconTalk interview with economist Steven Levitt</a>, Russ Roberts asks Levitt about writing his popular book Freakonomics:</p><p><i>"What was the reaction from your colleagues in the profession...You know, I have a similar route. I'm not as successful as you are, but I've popularized a lot of economics...it was considered somewhat untoward to waste your time speaking to a popular audience."</i></p><p>Levitt responded by saying the reaction was not so bad, but the fact that Russ had to broach the topic is evidence of the toxic culture that academics face when doing science communication. The negative stigma associated with good science communication is not limited to economics or the social and behavioral sciences. </p><p>In his <a href="http://www.talkingbiotechpodcast.com/293-debunking-the-dirty-dozen/" target="_blank">Talking Biotech podcast episode Debunking the Disinformation Dozen</a>, scientist and science communicator Kevin Folta discusses his strident efforts facing off these toxic elements:</p><p><i>"I have always said that communication is such an important part of what we do as scientists but I have colleagues who say you are wasting your time doing this...Folta why are you wasting your time doing a podcast or writing scientific stuff for the public."</i></p><p>Some of this is just bad behavior, some of it is gatekeeping done in the name of upholding the scientific integrity of their field, some of it is the attempt of others to prove their competence to themselves or others, and maybe some of it is the result of people genuinely trying to provide peer review to their colleagues that they think have gone astray. But most of it is unhelpful when it comes to influencing decision makers or improving general scientific literacy. It doesn't matter how great the discovery, how impactful the findings, we have all seen from the pandemic that effective science communication is critical for overcoming the effects of misinformation and disinformation. A culture that is toxic toward effective science communication becomes an impediment to science itself and leaves a void waiting be filled by science deniers, activists, policy makers, decision makers, and special interests.</p><p>This can be challenging when you add the <a href="https://www.nature.com/articles/s41562-018-0520-3" target="_blank">Dunning-Kruger </a>effect to the equation. Those that know the least may be the most vocal while scientists and those with expertise sit on the sidelines. As Bryan Caplan states in his book The Myth of the Rational Voter:</p><p><i>"There are two kinds of errors to avoid. Hubris is one, self abasement is the other. The first leads experts to over reach themselves; the second leads experts to stand idly by while error reigns."</i></p><p><b>How Does Culture and Mental Accounting Impact Science Communication?</b></p><p>So as I've written above, in the scientific community there is sort of a toxic culture that inhibits good science communication. In the<a href="https://www.fourbeers.com/guests/hobson" target="_blank"> Two Psychologists Four Beers podcast </a> (<b>WARNING: the intro of this podcast episode may contain vulgarity)</b> behavioral scientist Nick Hobson makes an interesting comparison between MBAs and scientists. </p><p><i>"as scientists we need to be humble with regards to our data...one thing we are learning from our current woes of replication (the replication crisis) is we know a lot less than we think. This has conditioned us to be more humble....vs. business school people that are trained to be more assertive and confident."</i></p><p>I'd like to propose an analogy relating to mental accounting. It seems like when a scientist gets their degree it comes with a mental account called scientific credibility. Speaking and writing to a general audience risks taking a charge against that account, and they are trained to be extremely frugal about managing it. Communication becomes an exercise in risk management. If they say or communicate something that is of the slightest error, missing the slightest nuance, a colleague may call them out. Gotcha! Psychologically, this would call for a huge charge against their 'account' and reputation. Its not quite a career ending mistake like making a fraudulent claim or faking data, but it's bad enough to be avoided at great cost. MBAs don't have a mental account called scientific credibility. They aren't long on academic credibility so they don't require putting on the communication hedges the way scientists often do. They come off as better communicators and more confident while scientists risk becoming stereotyped as unable to be effective communicators. </p><p>To protect their balance at all costs and avoid social harassment from their peers, economists and scientists may tend to speak with caveats, hedges, and qualifications. This may also mean a delayed response. Before even thinking about communicating results in many cases requires in depth rigorous analysis, sensitivity checks etc. It requires doing science which is by nature slow while the public wants answers fast. Faster answers might mean less time for analysis which calls for more caveats. This can all be detrimental to effective communication to non-technical audiences. Answers become either too slow or too vague to support decision making (recall Torsten Slok's comments above). It gives the impression of a lack of confidence and relevance and a stereotype that technical people (economists, scientists, data scientists etc.) fail to offer definitive or practical conclusions. As Bryan Caplan notes discussing the role of economists in The Myth of the Rational Voter:</p><p><i>"when the media spotlight gives other experts a few seconds to speak their mind, they usually strive to forcefully communicate one or two simplified conclusions....but economists are reluctant to use this strategy. Though the forum demands it they think it unseemly to express a definitive judgement. This is a recipe for being utterly ignored."</i></p><p>Students graduating from economics and science based graduate programs may inherit these mental accounts and learn these 'hedging strategies' from their professors, from the program, and the seminar culture that comes with it.</p><p>Again, Nick Hobson offers great insight about how to deal with this kind of mental accounting in his own work:</p><p><i>"what I've wrestled with as I've grown the business is maintaining scientific integrity and the rigor but knowing you have to sacrifice some of it....you have to find and strike a balance between being data driven and humble while also being confident and strategic and cautious about the shortcuts you take."</i></p><p>In Thinking Fast and Slow, Kahneman argues that sometimes new leaders can produce better results because fresh thinkers can view problems without the same mental accounts holding back incumbents. The solution isn't to abandon scientific training and the value it brings to the table in terms of rigor and statistical and causal reasoning. The solution is to learn how to view problems in a way that avoids the kind of mental accounting I have been discussing. This also calls for a cultural change in the educational system. As Kevin Folta stated in the previous Talking Biotech Podcast:</p><p><i>"Until we have a change in how the universities and how the scientific establishment sees these efforts as positive and helpful and counts toward tenure and promotion I don't think you are going to see people jump in on this." </i></p><p>Given graduate and PhD training may come with such baggage, one alternative may be to develop programs with more balance, like <a href="http://econometricsense.blogspot.com/2017/06/professional-science-masters-degrees.html" target="_blank">Professional Science Master's degrees</a> or at least create courses or certificates that focus on translational knowledge and communication skills. Or seek out graduate study under folks like Dr. Folta who are great scientists and researchers that can also help you overcome the barriers to communicate science effectively. If that is the case we are going to need more Dr. Folta's.</p><p><b>References:</b></p><p>The Myth of the Rational Voter: Why Democracies Choose Bad Policies. Bryan Caplan. Princeton University Press. 2007.</p><p>The stoic challenge : a philosopher's guide to becoming tougher, calmer, and more resilient. William Braxton Irvine. Norton & Co. NY. 2019</p><p>The Analytics Lifecycle Toolkit: A Practical Guide for an Effective Analytics Capability. Gregory S. Nelson. 2018.</p><div><div></div></div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-34631334545951482002021-04-01T13:01:00.008-04:002021-05-22T11:20:24.226-04:00The Value of Business Experiments Part 3: Innovation, Strategy, and AlignmentIn previous posts I have discussed the value proposition of business experiments from both a classical and behavioral economic perspective. This series of posts has been greatly influenced by Jim Manzi's book <a href="https://www.manhattan-institute.org/uncontrolled">'Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society.' </a>Midway through the book Manzi highlights three important things that experiments in business can do:<br /><br />1) They provide precision around the tactical implementation of strategy<br />2) They provide feedback on the performance of a strategy which allows for refinements to be driven by evidence<br />3) They help achieve organizational and strategic alignment<br /><br />Manzi explains that within any corporation there are always silos and subcultures advocating competing strategies with perverse incentives and agendas in pursuit of power and control. How do we know who is right and which programs or ideas are successful considering the many factors that could be influencing any outcome of interest? Manzi describes any environment where the number of causes of variation are enormous as an environment that has '<i>high causal density.'</i> We can claim to address this with a data driven culture, but what does that mean? Modern companies in a digital age with AI and big data are drowning in data. This makes it easy to adorn rhetoric in advanced analytical frameworks. <a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html">Because data seldom speaks, anyone can speak for the data through wily data story telling.</a><br /><div><br /></div><div>As Jim Manzi and Stefan Thomke discuss in <a href="https://hbr.org/2014/12/the-discipline-of-business-experimentation">Harvard Business Review:</a><br /><br /><i>"business experiments can allow companies to look beyond correlation and investigate causality....Without it, executives have only a fragmentary understanding of their businesses, and the decisions they make can easily backfire."</i><br /><br />In complex environments with high causal density, we don't know enough about the nature and causes of human behavior, decisions, and causal paths from actions to outcomes to list them all and measure and account for them even if we could agree how to measure them. This is the nature of decision making under uncertainty. But, as R.A. Fisher taught us with his agricultural experiments, randomized tests allow us to account for all of these hidden factors (Manzi calls them hidden conditionals). Only then does our data stand a chance to speak truth.</div><div><br /></div><div>In <i><a href="https://www.innosight.com/insight/dual-transformation/" target="_blank">Dual Transformation: How to Reposition Today's Business While Creating the Future</a></i> authors discuss the importance of experimentation as a way to navigate uncertainty in causally dense environments in what they refer to as <i>transformation B</i>:</div><div><div><br /></div><div><i>“Whenever you innovate, you can never be sure about the assumptions on which your business rests. So, like a good scientist, you start with a hypothesis, then design and experiment. Make sure the experiment has clear objectives (why are you running it and what do you hope to learn). Even if you have no idea what the right answer is, make a prediction. Finally, execute in such a way that you can measure the prediction, such as running a so-called A/B test in which you vary a single factor."</i></div><div><br /></div></div><div>Experiments aren't just tinkering and trying new things. While these are helpful to innovation, just tinkering and measuring and observing still leaves you speculating about what really works and is <a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html" target="_blank">subject to all the same behavioral biases and pitfalls of big data previously discussed.</a></div><div><br /></div><div>List and Gneezy address this in <a href="https://www.amazon.com/Why-Axis-Undiscovered-Economics-Everyday-ebook/dp/B00BVTSBVO" target="_blank">The Why Axis</a>:</div><div><br /></div><div><i>"Many businesses experiment and often...businesses always tinker...and try new things...the problem is that businesses rarely conduct experiments that allow a comparison between a treatment and control group...Business experiments are research investigations that give companies the opportunity to get fast and accurate data regarding important decisions."</i></div><div><br /></div><div>Three things distinguish a successful business experiment from just tinkering:</div><div><br /></div><div>1) Separation of signal from noise through well designed and sufficiently powered tests</div><div>2) Connecting cause and effect through randomization </div><div>3) Clear signals on business value that follows from 1 & 2 above</div><div><br /></div><div>Having causal knowledge helps identify more informed and calculated risks vs. risks taken on the basis of gut instinct, political motivation, or overly optimistic data-driven correlational pattern finding analytics. </div><div><br />Experiments add incremental knowledge and value to business. No single experiment is going to be a 'killer app' that by itself will generate millions in profits. But in aggregate the <a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html">knowledge </a>created by experiments probably offers the greatest strategic value across an enterprise compared to any other analytic method.</div><div><br /></div><div>As discussed earlier, business experiments create value by helping manage the knowledge problem within firms, it's worth repeating again from List and Gneezy:</div><div><br /></div><div><i>"We think that businesses that don't experiment and fail to show, through hard data, that their ideas can actually work before the company takes action - are wasting their money....every day they set suboptimal prices, place adds that do not work, or use ineffective incentive schemes for their work force, they effectively leave millions of dollars on the table."</i><br /><br />As Luke Froeb writes in Managerial Economics, A Problem Solving Approach (3rd Edition):<br /><br /><i>"With the benefit of hindsight, it is easy to identify successful strategies (and the reasons for their success) or failed strategies (and the reason for their failures). It's much more difficult to identify successful or failed strategies before they succeed or fail."</i><br /><br />Again from Dual Transformation:</div><div><br /></div><div><i>"Explorers recognize they can't know the right answer, so they want to invest as little as possible in learning which of their hypotheses are right and which ones are wrong"</i></div><div><br /></div><div>Business experiments offer the opportunity to test strategies early on a smaller scale to get causal feedback about potential success or failure before fully committing large amounts of irrecoverable resources. This takes the concept of failing fast to a whole new level. As discussed in The Why Axis and Uncontrolled, business experiments play a central role in product development and innovation across a range of industries and companies from Harrah's casinos, Capital One, and Humana who have been leading in this area for decades to new ventures like Amazon and Uber. </div><div><br /></div><div><i>"At Uber Labs, we apply behavioral science insights and methodologies to help product teams improve the Uber customer experience. One of the most exciting areas we’ve been working on is causal inference, a category of statistical methods that is commonly used in behavioral science research to understand the causes behind the results we see from experiments or observations...Teams across Uber apply causal inference methods that enable us to bring richer insights to operations analysis, product development, and other areas critical to improving the user experience on our platform."</i> - From: Using Causal Inference to Improve the Uber User Experience (<a href="https://eng.uber.com/causal-inference-at-uber/" target="_blank">link</a>)</div><div><br />Achieving the greatest value from business experiments requires leadership commitment. It also demands a culture that is genuinely open to learning through a blend of trial and error, data driven decision making informed by theory, and the infrastructure necessary for implementing enough tests and iterations to generate the knowledge necessary for rapid learning and innovation. The result is a corporate culture that allows an organization to formulate, implement, and modify strategy faster and more tactfully than others.<br /><br /><b>See also:</b><br /><a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html" target="_blank">The Value of Business Experiments: The Knowledge Problem</a><br /><a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html">The Value of Business Experiments Part 2: A Behavioral Economics Perspective</a><br /><a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html">Statistics is a Way of Thinking, Not a Box of Tools</a><br /><p> </p></div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-17189705514629012122021-03-13T19:07:00.029-05:002022-05-12T08:37:42.869-04:00Why Study Economics/Applied Economics?<p><b>Applied Economics is a broad field with many applications.</b></p><p>Applied Economics is a broad field of study covering many topics. Recognizing the wide range of applications has led departments of Agricultural Economics across numerous universities to change their degree program names to Applied Economics. In 2008, the American Agricultural Economics Association changed its name to the Agricultural and Applied Economics Association (AAEA).</p>This trend is noted in research published in the journal Applied Economic Perspectives and Policy:<br /><br /><span style="font-style: italic;">"Increased work in areas such as agribusiness, rural development, and environmental economics is making it more difficult to maintain one umbrella organization or to use the title “agricultural economist” ... the number of departments named" Agricultural Economics” has fallen from 36 in 1956 to 9 in 2007."</span><div><div><br /></div><div>This brief podcast from the University of Minnesota's <a href="https://www.apec.umn.edu/">Department of Applied Economics</a> is an example of this trend: </div><div><br /></div><div><a href="https://podcasts.apple.com/mt/podcast/what-is-applied-economics/id419408183?i=1000411492398">https://podcasts.apple.com/mt/podcast/what-is-applied-economics/id419408183?i=1000411492398</a> <br /></div><div><br /></div><div>It discusses the breadth of questions and problems applied economists address in their work including obesity and food systems, environmental and water resource economics, development, growth, trade, and technological change; public sector economics, health policy and management, human resources and industrial relations. Applied research in this area is often interdisciplinary including biology, engineering, health and animal sciences, and nutrition as an example. </div><div><br /></div><div><div>Why study applied economics? A few inspiring quotes from <a href="https://coas.siu.edu/academics/departments/agribusiness-economics/what-is-agribusiness-economics.html" target="_blank">Southern Illinois University</a> introduction to their programs in Agribusiness Economics:</div><div><br /></div><div><div><i>If you want to prove sustainable resource use saves money and protects the land…</i></div><div><i>If you understand that the wheat crop here can make a difference for a hungry child across the ocean… </i></div></div></div><div><p><b>Applied Economics emphasizes quantitative and analytics skills ideal for careers in data science</b></p><p>Many applied economics master's degrees are designed to serve as a very attractive terminal degree for professionals. </p><p>To quote, from <a href="https://e-catalogue.jhu.edu/arts-sciences/advanced-academic-programs/programs/applied-economics-master-science/" target="_blank">Johns Hopkins University’s Applied Economics program </a>home page:</p><p><span style="font-style: italic;">“Economic analysis is no longer relegated to academicians and a small number of PhD-trained specialists. Instead, economics has become an increasingly ubiquitous as well as rapidly changing line of inquiry that requires people who are skilled in analyzing and interpreting economic data, and then using it to effect decisions ………Advances in computing and the greater availability of timely data through the Internet have created an arena which demands skilled statistical analysis, guided by economic reasoning and modeling.”</span></p><p>Many applied economics programs are STEM designated programs reflecting the emphasis that applied economics places on quantitative and analytics skills. The University of Pittsburg has designed their STEM designated <a href="https://www.mqe.pitt.edu/" target="_blank">M.S. in Quantitative Economics</a> specifically with data science roles in mind. <a href="https://onlinems.aaec.vt.edu/" target="_blank">Virginia Tech </a>offers an online Master of Ag and Applied Economics, the first I have seen in an Agricultural and Applied Economics department specifically designed to incorporate economics with data science and programming.</p><br /></div><div><b>The focus on causality differentiates economics from other fields.</b></div><div><b><br /></b></div><div>Once armed with predictions from machine learning an AI, businesses will start to ask questions about what decisions or factors are moving the needle on revenue or customer satisfaction and engagement or improved efficiencies. Essentially they will want to ask questions related to causality, which requires a completely different paradigm for data analysis.</div><div><br /></div><div><div><a href="https://www.kdnuggets.com/2012/08/exclusive-scott-nicholson-interview-economics-weather-linkedin-healthcare.html" target="_blank">In a KDnuggets interview</a>, Economist Scott Nicholson (Chief Data Scientist at Accretive Health and formerly at LinkedIn) comments on the differences between economists and data scientists: </div><div><br /></div><div> <i>"In terms of applied work, economists are primarily concerned with establishing causation. This is key to understanding what influences individual decision-making, how certain economic and public policies impact the world, and tells a much clearer story of the effects of incentives. With this in mind, economists care much less about the accuracy of the predictions from their econometric models than they do about properly estimating the coefficients, which gets them closer to understanding causal effects. At Strata NYC 2011, I summed this up by saying: If you care about prediction, think like a computer scientist, if you care about causality, think like an economist."</i></div></div><div><br /></div><div>As data science thought leader <a href="https://www.superdatascience.com/podcast/podcast-one-purpose-data-science-truth-analytics" target="_blank">Eugene Dubossarsky </a>puts it in a SuperDataScience podcast:</div><div><br /></div><div><i>“the most elite skills…the things that I find in the most elite data scientists are the sorts of things econometricians these days have…bayesian statistics…inferring causality” </i></div><div><i><br /></i></div><div>Nobel Prize Laureate Joshua Angrist discussed the new opportunities for students graduating with economics and quantitative skills that are available at firms like Amazon because of their interest in causal questions and running experiments:</div><div><br /></div><div>
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/ZXgIEzf2wlc" title="YouTube video player" width="560"></iframe> </div><div><br /></div><div><a href="https://www.youtube.com/watch?v=T24j8XTcpe0" target="_blank">In another interview Angrist emphasizes </a>opportunities for Economics bachelor's degree holders:</div><div><br /></div><div><i>"There's a very strong private sector market for economics undergrad especially economics undergrads who have good training in econometrics...like Amazon and Google and Facebook and Trip Adviser they are looking for people that can do some statistics but a lot of the questions that they are interested in are causal questions. What will be the consequences of changing prices for example or changing marketing strategies and these companies have discovered that the best training for that is undergrad work in economics or econometrics. We really specialize in causality in a way regular data science does not.....someone who trains in data science might learn a lot about machine learning but won't necessarily learn about for example instrumental variables or regression discontinuity methods and those turn out to be very useful for the tech sector."</i></div><div><br /></div><div>A post at the <a href="https://eng.uber.com/causal-inference-at-uber/" target="_blank">Uber Engineering blog</a> explains how they find these skills to be valuable in a business setting: </div><div><br /></div><div><i>"One of the most exciting areas we’ve been working on is causal inference, a category of statistical methods that is commonly used in behavioral science research to understand the causes behind the results we see from experiments or observations...causal inference helps us provide a better user experience for customers on the Uber platform. The insights from causal inference can help identify customer pain points, inform product development, and provide a more personalized experience...At a higher level, causal inference provides information that is critical to both improving the user experience and making business decisions through better understanding the impact of key initiatives."</i></div><div><br /></div><div><b>Economics provides a foundation with long lasting value and offers a bright future.</b></div><div><br /></div><div>Economics combines mathematically precise theories (like microeconomics) and empirically sound methods (like econometrics) to study people's choices and how they are made compatible. As a social and behavioral science and a quantitative and technical field, learning to think like an economist and applying those skills will never go out of fashion. There are a number of both undergraduate and graduate degree programs in economics and applied economics across the country and I would encourage you to check them out. I've listed a few more examples of applied economics programs below.</div><div><br /></div><div><div>***This post is an update to an original post made in September 2010 <a href="https://econometricsense.blogspot.com/2010/09/why-study-appliedagricultural-economics.html" target="_blank">found here.</a></div><div> </div><div><b>Related Posts: </b></div><div><b><br /></b></div><div><div><a href="https://www.linkedin.com/pulse/data-science-problem-solving-approach-matt-bogard/?trackingId=zuJVqQBERr2RpZq%2FgdOU4Q%3D%3D " target="_blank">Data Science: A 'Knowledge' Problem Solving Approach</a></div><div><br /></div><div>Will There Be a Credibility Revolution in Data Science? <a href="https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html">https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html</a></div><div><br /></div><div>Why Data Science Needs Economics. <a href="http://econometricsense.blogspot.com/2016/10/why-data-science-needs-economics.html " target="_blank">http://econometricsense.blogspot.com/2016/10/why-data-science-needs-economics.html </a></div></div><div><b><br /></b></div><div>Economists as Data Scientists <a href="http://econometricsense.blogspot.com/2012/10/economists-as-data-scientists.html">http://econometricsense.blogspot.com/2012/10/economists-as-data-scientists.html </a> <br /><br /><b>References:</b><br /><br />'What is the Future of Agricultural Economics Departments and the Agricultural and Applied Economics Association?' By Gregory M. Perry. Applied Economic Perspectives and Policy (2010) volume 32, number 1, pp. 117–134.<br /></div></div></div><div><br /></div><div><b>Additional Graduate Programs in Applied Economics and Related Fields</b></div><div><br /></div><div><a href="https://www.wku.edu/mae/index.php" target="_blank">Western Kentucky University </a>- M.A. in Applied Economics (Also UG and GR options in <a href="https://www.wku.edu/agriculture/degreeprograms2.php" target="_blank">Agriculture and Food Science</a></div><div><a href="https://www.murraystate.edu/academics/CollegesDepartments/HutsonSchoolOfAgriculture/Programs/mastersinag.aspx" target="_blank">Murray State University </a>- M.S. Agriculture/Agribusiness Economics </div><div><a href="https://onlinems.aaec.vt.edu/" target="_blank">Virginia Tech -</a> M.S. Ag and Applied Economics</div><div><a href="https://business.uc.edu/academics/specialized-masters/applied-economics.html" target="_blank">University of Cincinnati </a>- M.S. Applied Economics</div><div><a href="https://www.clemson.edu/cafls/departments/agricultural-sciences/students/applied-economics-and-statistics/" target="_blank">Clemson University </a>- M.S. Applied Economics and Statistics</div><div><a href="https://www.montana.edu/econ/graduateprogram.html">Montana State University </a>- M.S. Applied Economics</div><div><a href="https://dyson.cornell.edu/" target="_blank">Cornell University </a>- M.S. & M.P.S. in Applied Economics and Management </div><div><a href="https://go.okstate.edu/graduate-academics/programs/masters/agricultural-economics-ms.html" target="_blank">Oklahoma State University </a>- M.S. Agricultural Economics and MAg in Agribusiness</div><div><a href="https://agecon.tamu.edu/" target="_blank">Texas A&M </a>- M.S. in Agricultural Economics</div><div><a href="https://bulletin.ndsu.edu/programs-study/graduate/agribusiness-applied-economics/" target="_blank">North Dakota State University</a> - M.S. Agribusiness and Applied Economics</div><div><a href="https://ace.illinois.edu/graduate/masters" target="_blank">University of Illinois </a>- M.S. Agricultural and Applied Economics</div><div><a href="To quote, from Johns Hopkins University’s Applied Economics program home page: “Economic analysis is no longer relegated to academicians and a small number of PhD-trained specialists. Instead, economics has become an increasingly ubiquitous as well as rapidly changing line of inquiry that requires people who are skilled in analyzing and interpreting economic data, and then using it to effect decisions ………Advances in computing and the greater availability of timely data through the Internet have created an arena which demands skilled statistical analysis, guided by economic reasoning and modeling.”" target="_blank">University of Missouri </a>- Agricultural and Applied Economics</div><div><a href="https://aers.auburn.edu/graduate-degrees/" target="_blank">Auburn University </a>- M.S. Agricultural Economics and Rural Sociology (various programs)</div><div><a href="https://www.aaea.org/about-aaea/agricultural-and-applied-economics-departments" target="_blank">AAEA </a> - Directory of additional programs at the graduate and undergraduate levels</div><div><br /></div><div><br /></div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-6161066446026646612020-09-30T14:22:00.001-04:002020-09-30T14:22:10.231-04:00Calibration, Discrimination, and Ethics<p>Classification models with binary and categorical outcomes are often assessed based on the c-statistic or area under the ROC curve. (see also:<a href="http://econometricsense.blogspot.com/2013/04/is-roc-curve-good-metric-for-model.html">http://econometricsense.blogspot.com/2013/04/is-roc-curve-good-metric-for-model.html</a>)</p><p>This metric ranges between 0 and 1 and provides a summary of model performance in terms of its ability to rank observations. For example, if a model is developed to predict the probability of default, the area under the ROC curve can be interpreted as the probability that a randomly chosen observation from the observed default class will be ranked higher (based on model predictions or probability) than a chosen observation from the observed non-default class (Provost and Fawcett, 2013). This metric is not without criticism and should not be used as the exclusive criteria for model assessment in all cases. As argued by Cook (2017):</p><p><i>'When the goal of a predictive model is to categorize individuals into risk strata, the assessment of such models should be based on how well they achieve this aim...The use of a single, somewhat insensitive, measure of model fit such as the c statistic can erroneously eliminate important clinical risk predictors for consideration in scoring algorithms'</i></p><p>Calibration is an alternative metric for model assessment. Calibration measures the agreement between observed and predicted risk or the closeness of model predicted probability to the underlying probability of the population under study. Both discrimination and calibration are included in the National Quality Forum’s Measure of Evaluation Criteria. However, many have noted that calibration is largely underutilized by practitioners in the data science and predictive modeling communities (Walsh et al., 2017; Van Calster et al., 2019). Models that perform well on the basis of discrimination (area under the ROC) may not perform well based on calibration (Cook,2017). And in fact a model with lower ROC scores could actually calibrate better than a model with higher ROC scores (Van Calster et al., 2019). This can lead to ethical concerns as lack of calibration in predictive models can in application result in decisions that lead to over or under utilization of resources (Van Calster et al, 2019).</p><p>Others have argued there are ethical considerations as well:</p><p><i>“Rigorous calibration of prediction is important for model optimization, but also ultimately crucial for medical ethics. Finally, the amelioration and evolution of ML methodology is about more than just technical issues: it will require vigilance for our own human biases that makes us see only what we want to see, and keep us from thinking critically and acting consistently.” (Levy, 2020)</i></p><p>Van Calster et al. (2019), Colin et al. (2017) and Steyerberg et al. (2010) provide guidance on ways of assessing model calibration.</p><p>Frank Harrel provides a great discussion about choosing the correct metrics for model assessment along with a wealth of resources <a href="https://www.fharrell.com/post/medml/">here.</a></p><p><b>References:</b></p><p>Matrix of Confusion. Drew Griffin Levy, PhD. GoodScience, Inc. https://www.fharrell.com/post/mlconfusion/ Accessed 9/22/2020</p><p>Nancy R. Cook, Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction. Circulation. 2007; 115: 928-935</p><p>Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. Tom Fawcett.O’Reilly. CA. 2013.</p><p>Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128-138. doi:10.1097/EDE.0b013e3181c30fb2</p><p>Colin G. Walsh, Kavya Sharman, George Hripcsak, Beyond discrimination: A comparison of calibration methods and clinical usefulness of predictive models of readmission risk, Journal of Biomedical Informatics, Volume 76, 2017, Pages 9-18, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2017.10.008</p><p> Van Calster, B., McLernon, D.J., van Smeden, M. et al. Calibration: the Achilles heel of predictive analytics. BMC Med 17, 230 (2019). https://doi.org/10.1186/s12916-019-1466-7</p>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-59643258106578856702020-09-02T19:06:00.005-04:002020-09-10T20:50:46.099-04:00Blocking and Causality<p>In a previous post I discussed <a href="http://econometricsense.blogspot.com/2020/08/blocked-designs.html" target="_blank">block randomized</a> designs. </p><p>Duflo et al (2008) describe this in more detail:</p><p><i>"Since the covariates to be used must be chosen in advance in order to avoid specification searching and data mining, they can be used to stratify (or block) the sample in order to improve the precision of estimates. This technique (¯rst proposed by Fisher (1926)) involves dividing the sample into groups sharing the same or similar values of certain observable characteristics. The randomization ensures that treatment and control groups will be similar in expectation. But stratification is used to ensure that along important observable dimensions this is also true in practice in the sample....blocking is more efficient than controlling ex post for these variables, since it ensures an equal proportion of treated and untreated unit within each block and therefore minimizes variance."</i></p><p>They also elaborate on blocking when you are interested in subgroup analysis:</p><p><i>"Apart from reducing variance, an important reason to adopt a stratified design is when the researchers are interested in the effect of the program on specific subgroups. If one is interested in the effect of the program on a sub-group, the experiment must have enough power for this subgroup (each sub-group constitutes in some sense a distinct experiment). Stratification according to those subgroups then ensure that the ratio between treatment and control units is determined by the experimenter in each sub-group, and can therefore be chosen optimally. It is also an assurance for the reader that the sub-group analysis was planned in advance."</i></p><p>Dijkman et al (2009) discuss subgroup analysis in blocked or stratified designs in more detail:</p><p><i>"When stratification of randomization is based on subgroup variables, it is more likely that treatment assignments within subgroups are balanced, making each subgroup a small trial. Because randomization makes it likely for the subgroups to be similar in all aspects except treatment, valid inferences about treatment efficacy within subgroups are likely to be drawn. In post hoc subgroup analyses, the subgroups are often incomparable because no stratified randomization is performed. Additionally, stratified randomization is desirable since it forces researchers to define subgroups before the start of the study."</i></p><p>Both of these accounts seem very much consistent with each other in terms of thinking about randomization within subgroups creating a mini trial where causal inferences can be drawn. But I think the key thing to consider is they are referring to comparisons made WITHIN sub groups and not necessarily BETWEEN subgroups. </p><p>Gerber and Green discuss this in one of their chapters on analysis of block randomized experiments :</p><p><i>"Regardless of whether one controls for blocks using weighted regression or regression with indicators for blocks, they key principle is to compare treatment and control subjects within blocks, not between blocks."</i></p><p>When we start to compare treatment and control units BETWEEN blocks or subgroups we are essentially interpreting covariates and this cannot be done with a causal interpretation. Green and Gerber discuss an example related to differences in the performance of Hindu vs. Muslim schools. </p><p><i>"it could just be that religion is a marker for a host of unmeasured attributes that are correlated with educational outcomes. The set of covariates included in an experimental analysis need not be a complete list of factors that affect outcomes: the fact that some factors are left out or poorly measured is not a source of bias when the aim is to measure the average treatment effect of the random intervention. Omitted variables and mismeasurement, however, can lead to sever bias if the aim is to draw causal inferences about the effects of covariates. Causal interpretation of the covariates encounters all of the threats to inference associated with analysis of observational data."</i></p><p>In other words, these kinds of comparisons face the the same challenges related to interpreting control variables in a regression in an observational setting (see Keele, 2020). </p><p>But why doesn't randomization within religion allow us to make causal statements about these comparisons? Let's think about a different example. Suppose we wanted to measure treatment effects for some kind of educational intervention and we were interested in subgroup differences in the outcome between public and private high schools. We could randomly assign treatments and controls within the public school population and do the same within the private school population. We know overall treatment effects would be unbiased because the school type would be perfectly balanced (instead of balanced just on average in a completely random design) and we would expect all other important confounders to be balanced between treatments and controls on average. </p><p><br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8-k65iDvCQNziYklSm0ZOJ-FkImjYsmmqZ9lRW1h2kg4G1YxvDNt_Oyb2nfkWw2pZ3qr8UAS-K-IxeWhel05jb2Pn2ojCfTV1Iznjgl_Saeju4SKjiEPLocQt4Dnetg_pRrTag5qKBc6F/s487/stratified+by+education.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="221" data-original-width="487" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8-k65iDvCQNziYklSm0ZOJ-FkImjYsmmqZ9lRW1h2kg4G1YxvDNt_Oyb2nfkWw2pZ3qr8UAS-K-IxeWhel05jb2Pn2ojCfTV1Iznjgl_Saeju4SKjiEPLocQt4Dnetg_pRrTag5qKBc6F/s0/stratified+by+education.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><br /><p></p><p>We also know that within the group of private schools the treatment and controls should at least on average be balanced for certain confounders (median household income, teacher's education/training/experience, and perhaps an unobservable confounder related to student motivation). </p><p>We could say the same thing about comparisons WITHIN the subgroup of public schools. But there is no reason to believe that the treated students in private schools would be comparable to the treated students in public schools because there is no reason to expect that important confounders would be balanced when making the comparisons. </p><p>Assume we are looking at differences in first semester college GPA. Maybe within the private subgroup we find that treated treated students on average have a first semester college GPA that is .25 points higher the comparable control group. But within the public school subgroup, this differences was only .10. We can say that there is a difference in outcomes of .15 points between groups but can we say this is causal? Is the difference really related to school type or is school type really a proxy for income, teacher quality, or motivation? If we increased motivation or income in the public schools would that make up the difference? We might do better if our design originally stratified on all of these important confounders like income and teacher education. Then we could compare students in both public and private schools with similar family incomes and teachers of similar credentials. But...there is no reason to believe that student motivation would be balanced. We can't block or stratify on an unobservable confounder. Again, as Gerber and Green state, we find ourselves in a world that borders between experimental and non-experimental methods. Simply, the subgroups defined by any particular covariate that itself is not or cannot be randomly assigned may have different potential outcomes. What we can say from these results is that school type predicts the outcome but does not necessarily cause it.</p><p>Gerber and Green expound on this idea:</p><p><i>"Subgroup analysis should be thought of as exploratory or descriptive analysis....if the aim is simply to predict when treatment effects will be large, the researcher need not have a correctly specified causal model that explains treatment effects (see <a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html" target="_blank">to explain or predict)</a>....noticing that treatment effects tend to be large in some groups and absent from others can provide important clues about why treatments work. But resist the temptation to think subgroup differences establish the causal effect of randomly varying one's subgroup attributes."</i></p><p><b>References</b></p><p>Dijkman B, Kooistra B, Bhandari M; Evidence-Based Surgery Working Group. How to work with a subgroup analysis. Can J Surg. 2009;52(6):515-522. </p><p>Duflo, Esther, Rachel Glennerster, and Michael Kremer. 2008. “Using Randomization in Development Economics Research: A Toolkit.” T. Schultz and John Strauss, eds., Handbook of Development Economics. Vol. 4. Amsterdam and New York: North Holland.</p><p>Gerber, Alan S., and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton</p><p>Keele, L., Stevenson, R., & Elwert, F. (2020). The causal interpretation of estimated associations in regression models. Political Science Research and Methods, 8(1), 1-13. doi:10.1017/psrm.2019.31</p>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-20371316997373381792020-08-28T16:04:00.006-04:002020-08-28T18:15:57.880-04:00Blocked DesignsWhen I first learned about randomized complete block designs as an undergraduate to me it was just another set of computations to memorize for the test. (this was before I understood statistics as a <a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html">way of thinking not a box of tools)</a>. However it is an important way to think about your experiment.<br /><br /><div>In Steel and Torrie's well known experimental design text, they discuss:</div><div><br /></div><div><i>"in many situations it is known beforehand that certain experimental units, if treated alike, will behave differently....designs or layouts can be constructed so that the portion of variability attributed to the recognized source can be measured and thus excluded from the experimental error." </i></div><div><br /></div><div>In other words, blocking improves the precision of estimates in randomized designs. In experimental research, blocking often implies randomly assigning treatment and control groups within blocks (or strata) based on a set of observed pre-treatment covariates. By guaranteeing that treatment and control units are identical in their covariate values, we eliminate the chance that differences in covariates among treatment and control units will impact inferences. </div><div><br /></div><div>With a large enough sample size and successfully implemented randomization, we expect treatment and control units to be 'balanced' at least on average across covariate values. However, it is always wise to <a href="http://econometricsense.blogspot.com/2020/07/assessing-balance-for-matching-and-rcts.html" target="_blank">assess covariate balance</a> after randomization to ensure that this is the case. </div><div><br /></div><div>One argument for blocking is to prevent such scenarios. In cases where randomization is deemed to be successfully implemented, treatment and control units will have similar covariate values on average or in expectation. But with block randomization treatment and control units are guaranteed to be identical across covariate values. </div><div><br /></div><div><b>Blocking vs. Matching and Regression</b></div><div><br /></div><div>It is common practice, if we find imbalances or differences in certain covariate or control variables that we 'control' for this after the fact often using linear regression. Gerber and Green discuss blocking extensively. They claim however that for experiments with sample sizes with more than 100 observations, the gains in precision from block randomization over a completely randomized design (with possible regression adjustments with controls for imbalances) become negligible (citing Rosnberger and Lachin, 2002). However they caution. Having to resort to regression with controls introduces the temptation to interpret control variables causally in ways that are inappropriate (see also Keele, 2020)</div><div><br /></div><div>In observational settings where randomization does not occur, we often try to mimic the covariate balance we would get in a randomized experiment through matching or regression. But there are important differences. Regression and matching create comparisons where covariate values are the same across treatment and control units in expectation or 'on average' for observable and measurable variables but not necessarily unobservable confounders. Randomization ensures on average that we get balanced comparisons for even unobservable and unmeasurable characteristics. King and Nielson are critical of propensity score matching in that they claim it attempts to mimic a completely randomized design when we should be striving for observational methods that attempt to target blocked randomized designs.</div><div><br /></div><div><i>"The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods."</i><br /><br /><br /><b>References:</b></div><div><br /></div><div>Gerber, Alan S., and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton</div><div><br /></div><div>Keele, L., Stevenson, R., & Elwert, F. (2020). The causal interpretation of estimated associations in regression models. Political Science Research and Methods, 8(1), 1-13. doi:10.1017/psrm.2019.31</div><div><br /></div><div>Gary King and Richard Nielsen. 2019. “Why Propensity Scores Should Not Be Used for Matching.” Political Analysis, 27, 4. Copy at https://j.mp/2ovYGsW</div><div><br /></div><div>Imai K, King G, Stuart EA. Misunderstandings among experimentalists and observationalists in causal inference. Journal of the Royal Statistical Society Series A. 2008;171(2):481–502.</div><div><br /></div><div>Principles and Procedures of Statistics: A Biometrical Approach. Robert George Douglas Steel, James Hiram Torrie, David A. Dickey. McGraw-Hill .1997 </div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-87839608251097749522020-07-26T13:17:00.007-04:002022-10-25T11:31:06.916-04:00Assessing Balance for Matching and RCTsAssessing the balance between treatment and control groups across control variables is an important part of propensity score matching. It's heuristically an attempt to ‘recreate’ a situation similar to a randomized experiment where all subjects are essentially the same except for the treatment (Thoemmes and Kim, 2011). Matching itself should not be viewed so much as an estimation technique, but a pre-processing step to ensure that members assigned to treatment and control groups have similar covariate distributions 'on average' (Ho et al., 2007)<br />
<br />
This understanding of matching often gets lost among practitioners and it is evident in attempts to use statistical significance testing (like t-tests) to assess baseline differences in covariates between treatment and control groups . This is often done as a means to (1) determine which variables to match on and (2) determine if appropriate balance has been achieved after matching.<br />
<br />
Stuart (2010) discusses this:<br />
<br />
<i>"Although common, hypothesis tests and p-values that incorporate information on the sample size (e.g., t-tests) should not be used as measures of balance, for two main reasons (Austin, 2007; Imai et al., 2008). First, balance is inherently an in-sample property, without reference to any broader population or super-population. Second, hypothesis tests can be misleading as measures of balance, because they often conflate changes in balance with changes in statistical power. Imai et al. (2008) show an example where randomly discarding control individuals seemingly leads to increased balance, simply because of the reduced power."</i><br />
<br />
Imai et al. (2008) elaborate. Using simulation they demonstrate that:<br />
<br />
<i>"The t-test can indicate that balance is becoming better whereas the actual balance is growing worse, staying the same or improving. Although we choose the most commonly used t-test for illustration, the same problem applies to many other test statistics that are used in applied research. For example, the same simulation applied to the Kolmogorov–Smirnov test shows that its p-value monotonically increases as we randomly drop more control units. This is because a smaller sample size typically produces less statistical power and hence a larger p-value"</i><br />
<br />
and<br />
<br />
<i>"from a theoretical perspective, balance is a characteristic of the sample, not some hypothetical population, and so, strictly speaking, hypothesis tests are irrelevant in this context"</i><br />
<br />
Austin (2009) has a paper devoted completely to balance diagnostics for propensity score matching (absolute standardized differences are recommended as an alternative to using significance tests).<br />
<br />
OK so based on this view of matching as a data pre-processing step in an observational setting, using hypothesis tests and p-values to assess balance may not make sense. But what about randomized controlled trials and randomized field trials? In those cases randomization is used as a means to achieve balance outright instead of matching after the fact in an observational setting. Even better, we hope to achieve balance on unobservable confounders that we could never measure or match on. But sometimes randomization isn't perfect in this regard, especially in smaller samples. So we still may want to investigate treatment and control covariate balance in this setting in order to (1) identify potential issues with randomization (2) statistically control for any chance imbalances.<br />
<br />
Altman (1985) discusses the implication of using significance tests to assess balance in randomized clinical trials:<br />
<br />
<i>"Randomised allocation in a clinical trial does not guarantee that the treatment groups are comparable with respect to baseline characteristics. It is common for differences between treatment groups to be assessed by significance tests but such tests only assess the correctness of the randomisation, not whether any observed imbalances between the groups might have affected the results of the trial. In particular, it is quite unjustified to conclude that variables that are not significantly differently distributed between groups cannot have affected the results of the trial."</i><br />
<br />
<i>"The possible effect of imbalance in a prognostic factor is considered, and it is shown that non‐significant imbalances can exert a strong influence on the observed result of the trial, even when the risk associated with the factor is not all that great."</i><br />
<br />
Even though this was in the context of an RCT and not an observational study, this seems to parallel the simulation results from Imai et al. (2008). Altman seems indignant about the practice:<br />
<div>
<br /></div>
<div>
<div>
<i>"Putting these two ideas together, performing a significance test to compare baseline variables is to assess the probability of something having occurred by chance when we know that it did occur by chance. Such a</i></div>
<div>
<i>procedure is clearly absurd."</i></div>
<div>
<br /></div>
<div>
More recent discussions include Egbewale (2015) and also Pocock et al. (2002) who found that nearly 50% of practitioners were still employing significance testing to assess covariate balance in randomized trials.</div>
<div>
<br />
So if using significance tests for balance assessment in matched and randomized studies is so 1985....why are we still doing it?<br />
<br />
<b>References:</b><br />
<br />
Altman, D.G. (1985), Comparability of Randomised Groups. Journal of the Royal Statistical Society: Series D (The Statistician), 34: 125-136. doi:10.2307/2987510<br />
<br />
Austin, PC. Balance diagnostics for comparing the distribution of baseline<br />
covariates between treatment groups in propensity-score<br />
matched sample. Statist. Med. 2009; 28:3083–3107<br />
<br />
The performance of different propensity score methods for estimating marginal odds ratios.<br />
Austin, PC. Stat Med. 2007 Jul 20; 26(16):3078-94.<br />
<br />
Bolaji Emmanuel Egbewale. Statistical issues in randomised controlled trials: a narrative synthesis,<br />
Asian Pacific Journal of Tropical Biomedicine. Volume 5, Issue 5,<br />
2015,Pages 354-359,ISSN 2221-1691<br />
<br />
Ho, Daniel E. and Imai, Kosuke and King, Gary and Stuart, Elizabeth A., Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, Vol. 15, pp. 199-236, 2007, Available at SSRN: https://ssrn.com/abstract=1081983<br />
<br />
Imai K, King G, Stuart EA. Misunderstandings among experimentalists and observationalists in causal inference. Journal of the Royal Statistical Society Series A. 2008;171(2):481–502.<br />
<br />
Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med. 2002;21(19):2917-2930. doi:10.1002/sim.1296<br />
<br />
Stuart EA. Matching methods for causal inference: A review and a look forward. Stat Sci. 2010;25(1):1-21. doi:10.1214/09-STS313<br />
<br />
Thoemmes, F. J. & Kim, E. S. (2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46(1), 90-118.<br />
<br />
Balance diagnostics after propensity score matching<br />
Zhongheng Zhang1, Hwa Jung Kim2,3, Guillaume Lonjon4,5,6,7, Yibing Zhu8; written on behalf of AME Big-Data Clinical Trial Collaborative Group</div>
</div>
Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-66476424184246791972020-05-06T18:35:00.007-04:002022-12-18T12:33:46.353-05:00Experimentation and Causal Inference: Strategy and InnovationKnowledge is the most important resource in a firm and the essence of organizational capability, innovation, value creation, strategy, and competitive advantage. Causal knowledge is no exception.In previous posts I have discussed the value proposition of experimentation and causal inference from both mainline and behavioral economic perspectives. This series of posts has been greatly influenced by Jim Manzi's book <a href="https://www.manhattan-institute.org/uncontrolled">'Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society.' </a>Midway through the book Manzi highlights three important things that experimentation and causal inference in business settings can do:<br /><br />1) Precision around the tactical implementation of strategy<br />2) Feedback on the performance of a strategy and refinements driven by evidence<br />3) Achievement of organizational and strategic alignment<br /><br />Manzi explains that within any corporation there are always silos and subcultures advocating competing strategies with perverse incentives and agendas in pursuit of power and control. How do we know who is right and which programs or ideas are successful, considering the many factors that could be influencing any outcome of interest? Manzi describes any environment where the number of causes of variation are enormous as an environment that has '<i>high causal density.'</i> We can claim to address this with a data driven culture, but what does that mean? <i>How do we know what is, and isn't supported by data? </i>Modern companies in a digital age with AI and big data are drowning in data. This makes it easy to adorn rhetoric in advanced analytical frameworks. <a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html">Because data seldom speaks, anyone can speak for the data through wily data story telling.</a> <i>Decision makers fail to make the distinction between just having data, and having evidence to support good decisions.</i><br /><div><br /></div><div>As Jim Manzi and Stefan Thomke discuss in <a href="https://hbr.org/2014/12/the-discipline-of-business-experimentation">Harvard Business Review:</a><br /><br /><i>"business experiments can allow companies to look beyond correlation and investigate causality....Without it, executives have only a fragmentary understanding of their businesses, and the decisions they make can easily backfire."</i><br /><br />Without experimentation and causal inference, there is know way to connect the things we do with the value created. In complex environments with high causal density, we don't know enough about the nature and causes of human behavior, decisions, and causal paths from actions to outcomes to list them all and measure and account for them even if we could agree how to measure them. This is the nature of decision making under uncertainty. But, as R.A. Fisher taught us with his agricultural experiments, randomized tests allow us to account for all of these hidden factors (Manzi calls them hidden conditionals). Only then does our data stand a chance to speak truth. Experimentation and causal inference don't provide perfect information but they are the only means by which we can begin to say that we have data and evidence to inform the tactical implementation of our strategy as opposed to pretending that we do based on correlations alone. As economist F.A. Hayek once said:</div><div><br /></div><div><i>"I prefer true but imperfect knowledge, even if it leaves much undetermined and unpredictable, to a pretense of exact knowledge that is likely to be false"</i></div><div><br /></div><div>In <i><a href="https://www.innosight.com/insight/dual-transformation/" target="_blank">Dual Transformation: How to Reposition Today's Business While Creating the Future</a></i> authors discuss the importance of experimentation and causal inference as a way to navigate uncertainty in causally dense environments in what they refer to as <i>transformation B</i>:</div><div><div><br /></div><div><i>“Whenever you innovate, you can never be sure about the assumptions on which your business rests. So, like a good scientist, you start with a hypothesis, then design and experiment. Make sure the experiment has clear objectives (why are you running it and what do you hope to learn). Even if you have no idea what the right answer is, make a prediction. Finally, execute in such a way that you can measure the prediction, such as running a so-called A/B test in which you vary a single factor."</i></div><div><br /></div></div><div>Experiments aren't just tinkering and trying new things. While these are helpful to innovation, just tinkering and observing still leaves you speculating about what really works and is <a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html" target="_blank">subject to all the same behavioral biases and pitfalls of big data previously discussed.</a></div><div><br /></div><div>List and Gneezy address this in <a href="https://www.amazon.com/Why-Axis-Undiscovered-Economics-Everyday-ebook/dp/B00BVTSBVO" target="_blank">The Why Axis</a>:</div><div><br /></div><div><i>"Many businesses experiment and often...businesses always tinker...and try new things...the problem is that businesses rarely conduct experiments that allow a comparison between a treatment and control group...Business experiments are research investigations that give companies the opportunity to get fast and accurate data regarding important decisions."</i></div><div><br /></div><div>Three things distinguish experimentation and causal inference from just tinkering:</div><div><br /></div><div>1) Separation of signal from noise (statistical inference)</div><div>2) Connecting cause and effect (causal inference)</div><div>3) Clear signals on business value that follows from 1 & 2 above</div><div><br /></div><div>Having causal knowledge helps identify more informed and calculated risks vs. risks taken on the basis of gut instinct, political motivation, or overly optimistic and behaviorally biased data-driven correlational pattern finding analytics. </div><div><br />Experimentation and causal inference add incremental knowledge and value to business. No single experiment is going to be a 'killer app' that by itself will generate millions in profits. But in aggregate the <a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html">knowledge </a>created by experimentation and causal inference probably offers the greatest strategic value across an enterprise compared to any other analytic method.</div><div><br /></div><div>As discussed earlier, experimentation and causal inference creates value by helping <a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html" target="_blank">manage the knowledge problem within firms,</a> it's worth repeating again from List and Gneezy:</div><div><br /></div><div><i>"We think that businesses that don't experiment and fail to show, through hard data, that their ideas can actually work before the company takes action - are wasting their money....every day they set suboptimal prices, place adds that do not work, or use ineffective incentive schemes for their work force, they effectively leave millions of dollars on the table."</i><br /><br />As Luke Froeb writes in Managerial Economics, A Problem Solving Approach (3rd Edition):<br /><br /><i>"With the benefit of hindsight, it is easy to identify successful strategies (and the reasons for their success) or failed strategies (and the reason for their failures). It's much more difficult to identify successful or failed strategies before they succeed or fail."</i><br /><br />Again from Dual Transformation:</div><div><br /></div><div><i>"Explorers recognize they can't know the right answer, so they want to invest as little as possible in learning which of their hypotheses are right and which ones are wrong"</i></div><div><br /></div><div>Experimentation and causal inference offer the opportunity to test strategies early on a smaller scale to get causal feedback about potential success or failure before fully committing large amounts of irrecoverable resources. They allow us to fail smarter and learn faster. Experimentation and causal inference play a central role in product development, strategy, and innovation across a range of industries and companies like Harrah's casinos, Capital One, Petco, Publix, State Farm, Kohl's, Wal-Mart, and Humana who have been leading in this area for decades in addition to new ventures like Amazon and Uber. </div><div><br /></div><div><i>"At Uber Labs, we apply behavioral science insights and methodologies to help product teams improve the Uber customer experience. One of the most exciting areas we’ve been working on is causal inference, a category of statistical methods that is commonly used in behavioral science research to understand the causes behind the results we see from experiments or observations...Teams across Uber apply causal inference methods that enable us to bring richer insights to operations analysis, product development, and other areas critical to improving the user experience on our platform."</i> - From: Using Causal Inference to Improve the Uber User Experience (<a href="https://eng.uber.com/causal-inference-at-uber/" target="_blank">link</a>)</div><div><br /></div><div>Economist <a href="https://youtu.be/ZXgIEzf2wlc" target="_blank">Joshua Angrist explains </a>about his students that have went on to work for companies like Amazon: <i>"when I ask them what are they up to they say...we're running experiments."</i></div><div><br />Achieving the greatest value from experimentation and causal inference requires leadership commitment. It also demands a culture that is genuinely open to learning through a blend of trial and error, data driven decision making informed by theory and experiments, and the infrastructure necessary for implementing enough tests and iterations to generate the knowledge necessary for rapid learning and innovation. It requires business leaders, strategists, and product managers to think about what they are trying to achieve and asking causal questions to get there (vs. data scientists sitting in an ivory tower dreaming up models or experiments of their own). The result is a corporate culture that allows an organization to formulate, implement, and modify strategy faster and more tactfully than others.<br /><br /><b>See also:</b><br /><a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html" target="_blank">Experimentation and Causal Inference: The Knowledge Problem</a><br /><a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html">Experimentation and Causal Inference: A Behavioral Economics Perspective</a><br /><a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html">Statistics is a Way of Thinking, Not a Box of Tools</a></div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-10919146926983585862020-04-21T22:49:00.009-04:002022-12-18T12:34:40.837-05:00Experimentation and Causal inference: A Behavioral Economic PerspectiveIn my <a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html">previous post</a> I discussed the value proposition of experimentation and causal inference from a mainline economic perspective. In this post I want to view this from a behavioral economic perspective. From this point of view experimentation and causal inference can prove to be invaluable with respect to challenges related to overconfidence and decision making under uncertainty.<br />
<div>
<br /></div>
<div>
<b>Heuristic Data Driven Decision Making and Data Story Telling</b></div>
<div>
<br /></div>
<div>In a fast paced environment, decisions are often made quickly and often based on gut decisions. Progressive companies have tried as much as possible to leverage big data and analytics to be data driven organizations. Ideally, leveraging data would help to override biases and often gut instincts and ulterior motives that may stand behind a scientific hypothesis or business question. One of the many things we have learned from behavioral economics is that <a href="https://www.scientificamerican.com/article/patternicity-finding-meaningful-patterns/">humans tend to over interpret data into unreliable patterns that lead to incorrect conclusions.</a> Francis Bacon recognized this over 400 years ago:</div>
<div>
<br /></div>
<div>
<i>"the human understanding is of its own nature prone to suppose the existence of more order and regularity in the world than it finds" </i></div>
<div>
<br /></div>
<div>Anyone can tell a story with data. And with lots of data a good data story teller can tell a story to support any decision they want, good or bad. Decision makers can be easily duped by big data, ML, AI, and various BI tools into thinking that their data is speaking to them. As <a href="https://hbr.org/2014/12/the-discipline-of-business-experimentation">Jim Manzi and Stefan Thomke state in Harvard Business Review</a> in the absence of experimentation and causal inference</div><div><br />
<i>"executives end up misinterpreting statistical noise as causation—and making bad decisions"</i><br />
<br />
Data seldom speaks, and when it does it is often lying. This is the impetus behind the introduction of what became the scientific method. The true art and science of data science is teasing out the truth, or what version of truth can be found in the story being told. I think this is where experimentation and causal inference are most powerful and create the greatest value in the data science space. John List and Uri Gneezy discuss this in their book <i><a href="https://www.amazon.com/Why-Axis-Undiscovered-Economics-Everyday-ebook/dp/B00BVTSBVO" target="_blank">'The Why Axis.' </a></i></div><div><br /></div><div><i>"Big data is important, but it also suffers from big problems. The underlying approach relies heavily on correlations, not causality. As David Brooks has noted, 'A zillion things can correlate with each other depending on how you structure of the data and what you compare....because our work focuses on field experiments to infer causal relationships, and because we think hard about these causal relationships of interest before generating the data we go well beyond what big data could ever deliver."</i></div>
<div>
<br /></div>
<div>
<b>Decision Making Under Uncertainty, Risk Aversion, and The Dunning-Kruger Effect</b></div>
<div>
<br /></div>
<div>
Kahneman (in Thinking Fast and Slow) makes an interesting observation in relation to managerial decision making. Very often managers reward peddlers of even dangerously misleading information (data charlatans) while disregarding or even punishing merchants of truth. Confidence in a decision is often based more on the coherence of a story than the quality of information that supports it. Those that take risks based on bad information, when it works out, are often rewarded. To quote Kahneman:</div>
<div>
<br /></div>
<div>
<i>"a few lucky gambles can crown a reckless leader with a Halo of prescience and boldness"</i></div>
<div>
<br /></div>
<div>The essence of good decision science it understanding and seriously recognizing risk and uncertainty. As Kahneman discusses in Thinking Fast and Slow, those that often take the biggest risks are not necessarily any less risk averse, they simply are often less aware of the risks they are actually taking. This leads to overconfidence and lack of appreciation for uncertainty, and a culture where a solution based on pretended knowledge is often preferred and even rewarded. Its easy to see how the <a href="https://thedecisionlab.com/biases/dunning-kruger-effect/">Dunning-Kruger </a>effect would dominate. This feeds a viscous cycle that leads to collective blindness toward risk and uncertainty. It leads to taking risks that should be avoided in many cases, and prevents others from considering smarter calculated risks. <a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html">Thinking through an experimental design (engaging Kahneman's system 2) provides a structured way of thinking about business problems and all the ways our biases and the data can fool us.</a>. In this way experimentation and causal inference can ensure a better informed risk appetite to support decision making.</div>
<div>
<br /></div>
<div>
Just as rapid cycles of experiments in a business setting can aid in the <a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html">struggle with the knowledge problem</a>, experimentation and causal inference can aid us in our struggles with biased decision making and biased data. Data alone doesn't make good decisions because good decisions require something outside the data. Good decision science leverages experimentation and causal inference that brings theory and subject matter expertise together with data so we can make better informed business decisions in the face of our own biases and the biases in data.</div><div><br /></div><div>A business culture that supports risk taking coupled with experimentation and causal inference will come to value a preferred solution over pretended knowledge. That's valuable. </div><div><br /></div><div>Read <a href="http://econometricsense.blogspot.com/2020/05/the-value-of-business-experiments-part.html" target="_blank">Part 3: Strategy and Tactics</a></div>
<div>
<br /></div>
<div>
<b>See also:</b></div>
<div>
<br /></div>
<div>
<a href="https://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-and.html" target="_blank">The Value of Business Experiments: The Knowledge Problem</a></div>
<div>
<a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html">Statistics is a Way of Thinking, Not a Box of Tools</a></div>
<div>
<br /></div>
<div>
<br /></div>
Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-21152250425535578142020-04-20T21:13:00.017-04:002022-12-18T11:15:31.751-05:00Experimentation and Causal Inference Meet the Knowledge ProblemWhy should firms leverage experimentation and causal inference? With recent advancements in computing power and machine learning, why can't they simply base all of their decisions on predictions or historical patterns discovered in the data using AI? Perhaps statisticians and econometricians and others have a simple answer. The kinds of learnings that will help us understand the connections between decisions and the value we create require understanding causality. This requires something that may not be in the data to begin with. Experimentation and causal inference may be the best (if not the only) way of answering these questions. In this series of posts I want to focus on a number of fundamental reasons that experimentation and causal inference are necessary in business settings from the perspective of both mainline and behavioral economics:<br />
<br />
Part 1: The Knowledge Problem<br /><a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html" target="_blank">Part 2: Behavioral Biases</a><br /><a href="http://econometricsense.blogspot.com/2020/05/the-value-of-business-experiments-part.html" target="_blank">Part 3: Strategy and Tactics</a><br />
<br />
In this post I want to discuss the value of experimentation and causal inference from a basic economic perspective. The fundamental problem of economics, society, and business is the knowledge problem. In his famous 1945 American Economic Review article T<a href="https://www.econlib.org/library/Essays/hykKnw.html">he Use of Knowledge in Society</a>, Hayek argues:<br />
<br />
<i>"the economic problem of society is not merely a problem of how to allocate 'given resources'....it is a problem of the utilization of knowledge which is not given to anyone in its totality."</i><br />
<br />
A really good parable explaining the knowledge problem is the essay <a href="https://fee.org/resources/i-pencil/">I, Pencil </a>by Leonard E. Read. The fact that no one person possesses the necessary information to make something that seems so simple as a basic number 2 pencil captures the essence of the knowledge problem.<br />
<br />
If you remember your principles of economics, you know that the knowledge problem is solved by prices which reflect tradeoffs based on the disaggregated incomplete and imperfect knowledge and preferences of millions (billions) of individuals. Prices serve both the function of providing information and the incentives to act on that information. It is through this information creation and coordinating process that prices help solve the knowledge problem. Prices solve the problem of calculation that Hayek alluded to in his essay, and they are what coordinate all of the activities discussed in I, Pencil. <br />
<br />
In <a href="https://www.independent.org/store/book.asp?id=98">Living Economics: Yesterday, Today, and Tommorow</a> by Peter J. Boettke, discusses the knowledge problem in the context of firms and the work of economist Murray Rothbard:<br />
<br />
<i>"firms cannot vertically integrate without facing a calculation problem....vertical integration eliminates the external market for producer goods."</i><br />
<br /> <a href="https://en.wikipedia.org/wiki/Theory_of_the_firm">Coase,</a> also recognized that as firms integrate to eliminate transactions costs they also eliminate the markets which generate the prices that solve the knowledge problem! This tradeoff has to be managed well or firms go out of business. In a way firms could be viewed as little islands with socially planned economies in a sea of market competition. As Luke Froeb masterfully illustrates in his text <a href="https://www.cengage.com/c/managerial-economics-5e-froeb/9781337106665/">Managerial Economics: A Problem Solving Approach</a> (3rd Ed), decisions within firms in effect create regulations, taxes, and subsidies that destroy wealth creating transactions. Managers should make decisions that consummate the most wealth creating transactions (or do their best not to destroy, discourage, or prohibit wealth creating transactions).<div><br /></div><div>
So how do we solve the knowledge problems in firms without the information creating and coordinating role of prices? Whenever mistakes are made, Luke Froeb provides this problem solving algorithm that asks:<br />
<br />
1) Who is making the bad decision?<br />
2) Do they have enough information to make a good decision?<br />
3) Do they have the incentive to make a good decision?<br />
<br />
In essence, in absence of prices, we must try to answer the same questions that market processes often resolve. And we could leverage experimentation and causal inference to address each of the questions above:</div><div><br /></div><div>How do we know a decision was good or bad to begin with? </div><div>How do we get the information to make a good decision? </div><div>What incentives or nudges work best to motivate good decision making? </div><div><br /></div><div>What does failure to solve the knowledge problem in firms look like in practical terms? Failure to consummate wealth creating transactions implies money left on the table - but experimentation and causal inference can help us figure out how to reclaim some of these losses. List and Gneezy address this in <a href="https://www.amazon.com/Why-Axis-Undiscovered-Economics-Everyday-ebook/dp/B00BVTSBVO" target="_blank">The Why Axis</a>:<div><br /></div><div><i>"We think that businesses that don't experiment and fail to show, through hard data, that their ideas can actually work before the company takes action - are wasting their money....every day they set suboptimal prices, place adds that do not work, or use ineffective incentive schemes for their work force, they effectively leave millions of dollars on the table."</i></div><div><br /></div>
Going back to I, Pencil and Hayek's essay, the knowledge problem is solved through the spontaneous coordination of multitudes of individual plans via markets. Through a trial and error process where feedback is given through prices, the plans that do the best job coordinating peoples choices are adopted. Within firms there are often only a few plans compared to the market and these are in the form of various strategies and tactics. But as discussed in Jim Manzi's book <a href="https://www.manhattan-institute.org/uncontrolled">Uncontrolled</a>, firms can mimic this trial and error feedback process through iterative experimentation.<br />
<br />
While experimentation and causal inference cannot perfectly emulate the same kind of evolutionary feedback mechanisms prices deliver in market competition, an iterative test and learn culture within a business may provide the best strategy for dealing with the knowledge problem. And that is one of many ways that experimentation and causal inference can create value.</div><div><br /></div><div>Read <a href="http://econometricsense.blogspot.com/2020/04/the-value-of-business-experiments-part.html" target="_blank">Part 2: Behavioral Biases</a><br />
<br />
<b>See also:</b><br />
<b><br /></b>
<a href="https://econometricsense.blogspot.com/2020/04/statistics-is-way-of-thinking-not-just.html">Statistics is a Way of Thinking, Not a Box of Tools</a></div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-430417976425678192020-04-06T18:02:00.003-04:002020-08-22T16:07:50.933-04:00Statistics is a Way of Thinking, Not Just a Box of ToolsIf you have taken very many statistics courses you may have gotten the impression that it's mostly a mixed bag of computations and rules for conducting hypothesis tests or making predictions or creating forecasts. While this isn't necessarily wrong, it could leave you with the opinion that statistics is mostly just a box of tools for solving problems. Absolutely statistics provides us with important tools for understanding the world, but to think of statistics as 'just tools' can have some pitfalls (besides the most common pitfall of having a hammer and viewing every problem as a nail)<br />
<br />
For one, there is a huge <a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html">gap between the theoretical 'tools' and real world application</a>. This gap is filled with critical thinking, judgment calls, and various social norms, practices, and expectations that differ from field to field, business to business, and stakeholder to stakeholder. The art and science of statistics is often about filling this gap. That's a stretch more than 'just tools.'<br />
<br />
The proliferation of open source programming languages (like R and Python) and point and click automated machine learning solutions (like DataRobot and H2Oai) might give the impression that after you have done your homework in framing the business problem, data and feature engineering, then all that is left is hyper-parameter tuning and plugging and playing with a number of algorithms until the 'best' one is found. It might reduce to a mechanical (sometimes time consuming if not using automated tools) exercise. The fact that a lot of this work can in fact be automated probably contributes to the 'toolbox' mentality when thinking about the much broader field of statistics as a whole. In <a href="http://bayes.cs.ucla.edu/WHY/">The Book of Why,</a> Judea Pearl provides an example explaining why statistical inference (particularly causal inference) problems can't be reduced to easily automated mechanical exercises:<br />
<i><br /></i>
<i>"path analysis doesn't lend itself to canned programs......path analysis requires scientific thinking as does every exercise in causal inference. Statistics, as frequently practiced, discourages it and encourages "canned" procedures instead. Scientists will always prefer routine calculations on data to methods that challenge their scientific knowledge."</i><br />
<br />
Indeed, a routine practice that takes a plug and play approach with 'tools' can be problematic in many cases of statistical inference. A good example is simply <a href="http://econometricsense.blogspot.com/2016/03/whats-difference-between-difference-in.html">plugging GLM models into a difference-in-differences context.</a> Or combining <a href="http://econometricsense.blogspot.com/2019/02/was-it-meant-to-be-or-sometimes-playing.html">matching with difference-in-differences. </a>While we can get these approaches to 'play well together' under the correct circumstances its not as simple as calling the packages and running the code. Viewing methods of statistical inference and experimental design as just a box of tools to be applied to data could leave one open to the plug and play fallacy. There are times you might get by with using a flathead screwdriver to tighten up a phillips head screw, but we need to understand that inferential methods are not so easily substituted even if it looks like a snug enough fit on the surface.<br />
<br />
Understanding the business problem and data story telling are in fact two other areas of data science that would be difficult to automate . But don't let that fool you into thinking that the remainder of data science including statistical inference is simply a mechanical exercise that allows one to apply the 'best' algorithm to 'big data'. You might get by with that for a minority set of use cases that require a purely predictive or pattern finding solution but <a href="http://econometricsense.blogspot.com/2019/12/when-wicked-problems-meet-biased-data.html">the remainder of the world's problems are not so tractable.</a> Statistics is about more than data or the patterns we find in it. It's a way of thinking about the data.<br />
<br />
<i>"Causal Analysis is emphatically not just about data; in causal analysis we must incorporate some understanding of the process that produces the data and then we get something that was not in the data to begin with." - </i>Judea Pearl, The Book of Why<br />
<br />
<b>Statistics is A Way of Thinking</b><br />
<br />
In their well known advanced text book "<i>Principles and Procedures of Statistics, A Biometrical Approach"</i>, Steel and Torrie push back on the attitude that statistics is just about computational tools:<br />
<br />
<i>"computations are required in statistics, but that is arithmetic, not mathematics nor statistics...statistics implies for many students a new way of thinking; thinking in terms of uncertainties of probabilities.....this fact is sometimes overlooked and users are tempted to forget that they have to think, that statistics cannot think for them. Statistics can however help research workers design experiments and objectively evaluate the resulting numerical data."</i><br />
<br />
At the end of the day we are talking about leveraging data driven decision making to override biases and often gut instincts and ulterior motives that may stand behind a scientific hypothesis or business question. Objectively evaluating numerical data as Steel and Torrie put it above. But what do we actually mean by data driven decision making? Mastering (if possible) statistics, inference, and experimental design is part of a lifelong process of understanding and interpreting data to solve applied problems in business and the sciences. It's not just about conducting your own analysis and being your own worst critic, but also about interpreting, criticizing, translating and applying the work of others. Biologist and geneticist Kevin Folta <a href="http://econometricsense.blogspot.com/2017/07/the-value-of-graduate-educationand.html">put this well once in a Talking Biotech podcast:</a><br />
<br />
<i>"I've trained for 30 years to be able to understand statistics and experimental design and interpretation...I'll decide based on the quality of the data and the experimental design....that's what we do."</i><br />
<br />
In <a href="https://www.manhattan-institute.org/uncontrolled">'Uncontrolled' </a>Jim Manzi states:<br />
<br />
<i>"observing a naturally occurring event always leaves open the possibility of confounded causes...though in reality no experimenter can be absolutely certain that all causes have been held constant the conscious and rigorous attempt to do so is the crucial distinction between an experiment and an observation."</i><br />
<br />
Statistical inference and experimental design provide us with a structured way to think about real world problems and the data we have to solve them while avoiding as much as possible the gut based data story telling that intentional or not, can sometimes be confounded and misleading. As Francis Bacon once stated:<br />
<br />
<i>"what is in observation loose and vague is in information deceptive and treacherous"</i><br />
<br />
Statistics provides a rigorous way of thinking that moves us from mere observation to useful information.<br />
<br />
*UPDATE: Kevin Gray wrote a very good article that really gets at the spirit of a lot of what I wanted to convey in this post.<br />
<br />
<a href="https://www.linkedin.com/pulse/statistical-thinking-nutshell-kevin-gray/">https://www.linkedin.com/pulse/statistical-thinking-nutshell-kevin-gray/</a><br />
<br />
<b>See also:</b><br />
<br />
<a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">To Explain or Predict</a><br />
<br />
<a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html">Applied Econometrics</a>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-29701727581896568912020-02-12T21:29:00.003-05:002020-08-22T16:09:52.804-04:00Randomized Encouragement: When noncompliance may be a feature and not a bugMany times in a randomized controlled trial (RCT) issues related to non-compliance arise. Subjects assigned to the treatment fail to comply, while in other cases subjects that were supposed to be in the control group actually receive treatment. Other times we may have a new intervention (maybe it is a mobile app or some kind of product, service, or employer or government benefit) that law, contract, or nature implies that it can be accessed by everyone in our population of interest. We know that if we let nature take its course, users, adopters, or engagers are very likely going to be a self selected group that is different from others in a number of important ways. In a situation like this it could be very hard to know if observed outcomes from the new intervention are related to the treatment itself, or explained by other factors related to characteristics of those who choose to engage.<br />
<br />
In a 2008 article in the American Journal of Public Health, alternatives to randomized controlled trials are discussed, and for situations like this the authors discuss randomized encouragement:<br />
<br />
<i> "participants may be randomly assigned to an opportunity or an encouragement to receive a specific treatment, but allowed to choose whether to receive the treatment."</i><br />
<br />
In this scenario, less than full compliance is the norm, a feature and not a bug. The idea is to roll out access in conjunction with randomized encouragement. A randomized nudge.<br />
<br />
For example, in Developing a Digital Marketplace for Family Planning: Pilot Randomized Encouragement Trial (Green, et. al; 2018) randomized encouragement was used to study the impact of a digital health intervention related to family planning:<br />
<br />
<i>“women with an unmet need for family planning in Western Kenya were randomized to receive an encouragement to try an automated investigational digital health intervention that promoted the uptake of family planning”</i><br />
<br />
If you have a user base or population already using a mobile app you could randomize encouragement to utilize new features through the app. In other instances, you could randomize encouragement to use a new product, feature, or treatment through text messaging. Traditional ways this has been done is through mailers or phone calls.<br />
<br />
While treatment assignment or encouragement is random, non-compliance or the choice to engage or not engage is not! How exactly do we analyze results from a randomized encouragement trial in a way that allows us to infer causal effects? While common approaches include intent-to-treat (ITT) or maybe even per-protocol analysis, treatment effects for a randomized encouragement trial can also be estimated based on complier average causal effects or CACE.<br />
<br />
CACEs compare outcomes for individuals in the treatment group who complied with treatment (engaged as a result of encouragement) with individuals in the control group <i>who would have complied if given the opportunity to do so.</i> This is key. If you think this sounds a lot like<a href="http://econometricsense.blogspot.com/2019/04/intent-to-treat-instrumental-variables.html"> local average treatment effects in an instrumental variables framework</a> this is exactly what we are talking about.<br />
<br />
Angrist and Pishke (2015) discuss how instrumental variables can be used in the context of a randomized controlled trial (RCT) with non-compliance issues:<br />
<br />
<i> "Instrumental variable methods allow us to capture the causal effect of treatment on the treated in spite of the nonrandom compliance decisions made by participants in experiments....Use of randomly assigned intent to treat as an instrumental variable for treatment delivered eliminates this source of selection bias." </i><br />
<br />
Instrumental varaible analysis gives us an estimation of local average treatment effects (LATE), which are the same as CACE. In simplest terms, LATE is the average treatment effect for the sub-population of compliers in a RCT. Or, the compliers or engagers in a randomized encouragement design.<br />
<br />
There are obviously some assumptions involved and more technical details. Please see the references and other links below to read more about the mechanics, assumptions, and details involved as well as some toy examples.<br />
<br />
<b>References:</b><br />
<br />
Mastering 'Metrics: The Path from Cause to Effect Joshua D. Angrist and Jörn-Steffen Pischke. 2015.<br />
<br />
Connell A. M. (2009). Employing complier average causal effect analytic methods to examine effects of randomized encouragement trials. The American journal of drug and alcohol abuse, 35(4), 253–259. doi:10.1080/00952990903005882<br />
<br />
Green EP, Augustine A, Naanyu V, Hess AK, Kiwinda L<br />
Developing a Digital Marketplace for Family Planning: Pilot Randomized Encouragement Trial<br />
J Med Internet Res 2018;20(7):e10756<br />
<br />
Stephen G. West, Naihua Duan, Willo Pequegnat, Paul Gaist, Don C. Des Jarlais, David Holtgrave, José Szapocznik, Martin Fishbein, Bruce Rapkin, Michael Clatts, and Patricia Dolan Mullen, 2008:<br />
Alternatives to the Randomized Controlled Trial<br />
American Journal of Public Health 98, 1359_1366, https://doi.org/10.2105/AJPH.2007.124446<br />
<br />
<b>See also: </b><br />
<br />
<a href="http://econometricsense.blogspot.com/2019/04/intent-to-treat-instrumental-variables.html">Intent to Treat, Instrumental Variables and LATE Made Simple(er) </a><br />
<br />
<a href="http://econometricsense.blogspot.com/2017/07/instrumental-variables-and-late.html">Instrumental Variables and LATE </a><br />
<br />
<a href="http://econometricsense.blogspot.com/2017/06/instrumental-variables-vs-intent-to.html">Instrumental Variables vs. Intent to Treat </a><br />
<br />
<a href="http://econometricsense.blogspot.com/2015/11/instrumental-explanations-of.html">Instrumental Explanations of Instrumental Variables</a><br />
<br />
<a href="http://econometricsense.blogspot.com/2013/06/an-toy-instrumental-variable-application.html">A Toy Instrumental Variable Application</a><br />
<br />
<a href="http://econometricsense.blogspot.com/search/label/instrumental%20variables">Other posts on instrumental variables...</a><br />
<div>
<br /></div>
Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-63504057497871996382019-12-16T18:20:00.000-05:002019-12-17T09:24:18.565-05:00Some Recommended Podcasts and Episodes on AI and Machine Learning<span style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">Something I have been interested in for some time now is both is the convergence of <a href="https://www.linkedin.com/pulse/monoculture-vs-convergence-big-data-genomics-matt-bogard/">big data and genomics </a>and the convergence of <a href="https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html">causal inference and machine learning.</a> </span><br />
<span style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;"><br /></span>
<span style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">I am a big fan of the <a href="http://www.talkingbiotechpodcast.com/">Talking Biotech Podcast </a>which allows me to keep up with some of the latest issues and research in biotechnology and medicine. A recent episode related to <a href="http://www.talkingbiotechpodcast.com/214-artificial-intelligence-and-machine-learning/">AI and machine learning </a>covered a lot of topics that resonated with me. </span><br />
<span style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;"><br /></span>
<span style="font-family: -webkit-standard;">There was excellent discussion on the human element involved in this work, and the importance of data data prep/feature engineering </span>(the 80% of work that has to happen before the ML/AI can do its job) <span style="font-family: -webkit-standard;">and the challenges of non-standard 'omics' data. Also the potential biases that researchers and developers can inadvertently introduce in this process. Much more including applications of </span><span style="font-family: -webkit-standard;">machine learning and AI in this space and best ways to stay up to speed on fast changing technologies without having to be a heads down programmer. </span><br />
<span style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;"><br /></span>
<span style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">I've been in a data science role since 2008 and have transitioned from SAS to R to python. I've been able to keep up within the domain of causal inference to the extent possible, but </span>I keep up with broader trends I am interested in via podcasts like Talking Biotech. Below is a curated list of my favorites related to data science with a few of my favorite episodes highlighted.<br />
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br />
<br />
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<b><a href="http://casualinfer.libsyn.com/">1) Casual Inference </a></b>- This is my new favorite podcast by two biostatisticians covering epidemiology/biostatistics/causal inference - and keeping it casual.<br />
<br />
<i>Fairness in Machine Learning with Sherri Rose | Episode 03 - <a href="http://casualinfer.libsyn.com/fairness-in-machine-learning-with-sherri-rose-episode-03">http://casualinfer.libsyn.com/fairness-in-machine-learning-with-sherri-rose-episode-03</a></i><br />
<br />
This episode was the inspiration for my post: <a href="https://econometricsense.blogspot.com/2019/12/when-wicked-problems-meet-biased-data.html">When Wicked Problems Meet Biased Data.</a></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
</div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<b><a href="https://dataskeptic.com/podcast?limit=10&offset=0">2) Data Skeptic </a></b></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<b><a href="https://www.superdatascience.com/podcast">3) Super Data Science </a></b></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>#131 - The One Purpose of Data Science <a href="https://www.superdatascience.com/podcast/podcast-one-purpose-data-science-truth-analytics">https://www.superdatascience.com/podcast/podcast-one-purpose-data-science-truth-analytics</a> </i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i><br /></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>#051 Understanding the Newest Big Data Technology Buzz Terms <a href="https://www.superdatascience.com/podcast/sds-051-understanding-newest-big-data-technology-buzz-terms">https://www.superdatascience.com/podcast/sds-051-understanding-newest-big-data-technology-buzz-terms</a></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i><br /></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>#093 Evolutionary Programming - </i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<a href="https://www.superdatascience.com/podcast/podcast-evolutionary-programming"><i>https://www.superdatascience.com/podcast/podcast-evolutionary-programming</i></a></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<b><a href="https://twimlai.com/shows/">4) This Week in Machine Learning </a></b></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>#266 - Can we trust scientific discoveries made using machine learning</i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i><a href="https://twimlai.com/twiml-talk-266-can-we-trust-scientific-discoveries-made-using-machine-learning-with-genevera-allen/">https://twimlai.com/twiml-talk-266-can-we-trust-scientific-discoveries-made-using-machine-learning-with-genevera-allen/</a></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i><br /></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>#288 Automated ML for RNA Design <a href="https://twimlai.com/twiml-talk-288-automated-ml-for-rna-design-with-danny-stoll/">https://twimlai.com/twiml-talk-288-automated-ml-for-rna-design-with-danny-stoll/</a> </i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<b><a href="https://www.oreilly.com/topics/oreilly-data-show-podcast">5) O'Reilly Data Show </a></b></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>How social science research can inform the design of AI systems <a href="https://www.oreilly.com/radar/podcast/how-social-science-research-can-inform-the-design-of-ai-systems/">https://www.oreilly.com/radar/podcast/how-social-science-research-can-inform-the-design-of-ai-systems/</a> </i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<b><a href="https://talkpython.fm/">6) Talk Python to Me </a></b></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<b><a href="https://bioinformatics.chat/">7) Bioinformatics Chat </a></b> </div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>#37 Causality and potential outcomes with Irineo Cabreros - <a href="https://bioinformatics.chat/potential-outcomes">https://bioinformatics.chat/potential-outcomes </a></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<b><a href="https://www.econtalk.org/">8) EconTalk </a></b></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>Jim Manzi 'Uncontrolled' - <a href="https://www.econtalk.org/manzi-on-knowledge-policy-and-uncontrolled/">https://www.econtalk.org/manzi-on-knowledge-policy-and-uncontrolled/</a></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>Jim Manzi - Oregon Medicaid Experiment - <a href="https://www.econtalk.org/jim-manzi-on-the-oregon-medicaid-study-experimental-evidence-and-causality/">https://www.econtalk.org/jim-manzi-on-the-oregon-medicaid-study-experimental-evidence-and-causality/</a> </i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>Susan Athey - Big Data and Causality <a href="https://www.econtalk.org/susan-athey-on-machine-learning-big-data-and-causation/">https://www.econtalk.org/susan-athey-on-machine-learning-big-data-and-causation/</a> </i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>Josh Angrist - Econometrics and Causation <a href="https://www.econtalk.org/joshua-angrist-on-econometrics-and-causation/">https://www.econtalk.org/joshua-angrist-on-econometrics-and-causation/</a></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>Andrew Gelman - Social Science, Small Samples, and the Garden of Forking Paths <a href="https://www.econtalk.org/andrew-gelman-on-social-science-small-samples-and-the-garden-of-the-forking-paths/">https://www.econtalk.org/andrew-gelman-on-social-science-small-samples-and-the-garden-of-the-forking-paths/</a> </i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<i>James Heckman - Facts, Evidence, and the State of Econometrics <a href="https://www.econtalk.org/james-heckman-on-facts-evidence-and-the-state-of-econometrics/">https://www.econtalk.org/james-heckman-on-facts-evidence-and-the-state-of-econometrics/</a></i></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<br />
<br /></div>
<div style="caret-color: rgb(0, 0, 0); font-family: -webkit-standard; text-size-adjust: auto;">
<div style="font-family: Times;">
<b>See also:</b></div>
<div style="font-family: Times;">
<br /></div>
<div style="font-family: Times;">
</div>
<div style="font-family: Times;">
<a href="http://econometricsense.blogspot.com/2015/08/data-cleaning.html">Data Cleaning</a></div>
<div style="font-family: Times;">
<a href="http://econometricsense.blogspot.com/2015/06/got-data-not-like-you-find-in-your.html">Got Data? Probably not like your econometrics textbook!</a></div>
<div style="font-family: Times;">
<a href="http://econometricsense.blogspot.com/2015/01/in-god-we-trust-all-others-show-me-your.html">In God we trust, all others show me your code.</a></div>
<div style="font-family: Times;">
<a href="http://econometricsense.blogspot.com/2015/01/data-science-is-success-is-10.html">Data Science, 10% inspiration, 90% perspiration</a></div>
<div style="font-family: Times; margin-bottom: 0in;">
<span style="font-family: inherit; font-size: small;"><a href="http://econometricsense.blogspot.com/2015/01/the-internet-of-things-big-data-and.html"><span style="color: blue;">The Internet of Things, Big Data, and John Deere</span></a></span></div>
<div style="font-family: Times; margin-bottom: 0in;">
<span style="font-family: inherit; font-size: small;"> Big Ag Meets Big Data (<a href="http://www.econometricsense.blogspot.com/2013/03/big-ag-meets-big-data-part-1.html">Part 1 </a>& <a href="http://www.econometricsense.blogspot.com/2013/03/previously-i-discussed-role-of-social.html">Part 2</a>)</span></div>
<div style="font-family: Times; margin-bottom: 0in;">
<span style="font-family: inherit; font-size: small;"><u><a href="http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html" style="line-height: 20px; text-decoration-line: none;">Big Data- Causality and Local Expertise are Key in Agronomic Applications</a></u></span><br />
<span style="font-size: small;"><a href="http://econometricsense.blogspot.com/2012/10/bmc-proceedings-comparison-of-random.html">BMC </a></span><a href="http://econometricsense.blogspot.com/2012/10/bmc-proceedings-comparison-of-random.html">Proceedings: A comparison of random forests, boosting and support vector machines for genomic selection</a></div>
</div>
Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-53805830414740307692019-12-11T19:45:00.000-05:002020-03-03T08:13:29.497-05:00When Wicked Problems Meet Biased DataIn <i>"Dissecting racial bias in an algorithm used to manage the health of populations" </i>(Science, Vol 366 25 Oct. 2019) the authors discuss inherent racial bias in widely adopted algorithms in healthcare. In a nutshell these algorithms use predicted cost as a proxy for health status. Unfortunately, in healthcare, costs can proxy for other things as well:<br />
<br />
<i>"Black patients generate lesser medical expenses, conditional on health, even when we account for specific comorbidities. As a result, accurate prediction of costs necessarily means being racially biased on health."</i><br />
<br />
<b>So what happened? How can it be mitigated? What can be done going forward?</b><br />
<br />
In data science, there are some popular frameworks for solving problems. One widely known approach is the <a href="https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining">CRISP-DM </a>framework. Alternatively, in <a href="https://www.analyticslifecycletoolkit.com/">The Analytics Lifecycle Toolkit </a>a similar process is proposed:<br />
<br />
(1) - Problem Framing<br />
(2) - Data Sense Making<br />
(3) - Analytics Product Development<br />
(4) - Results Activation<br />
<br />
The wrong turn in Albuquerque here may have been at the corner of problem framing and data understanding or data sense making.<br />
<br />
The authors state:<br />
<br />
<i>"Identifying patients who will derive the greatest benefit from these programs is a challenging causal inference problem that requires estimation of individual treatment effects. To solve this problem health systems make a key assumption: Those with the greatest care needs will benefit the most from the program. Under this assumption, the targeting problem becomes a pure prediction public policy problem."</i><br />
<br />
The distinctions between 'predicting' and 'explaining' have been made in the literature by multiple authors in the last two decades. The problem with this substitution has important implications. To quote <a href="https://www.galitshmueli.com/content/explain-or-predict">Galit Shmueli:</a><br />
<br />
<i>"My thesis is that statistical modeling, from the early stages of study design and data collection to data usage and reporting, takes a different path and leads to different results, depending on whether the goal is predictive or explanatory."</i><br />
<br />
Almost a decade before, Leo Brieman encouraged us to think outside the box when solving problems by considering multiple approaches:<br />
<br />
<i>"Approaching problems by looking for a data model imposes an a priori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems. The best available solution to a data problem might be a data model; then again it might be an algorithmic model. The data and the problem guide the solution. To solve a wider range of data problems, a larger set of tools is needed."</i><br />
<br />
A number of data analysts today may not be cognizant of the differences in predictive vs explanatory modeling and statistical inference. It may not be clear to them how that impacts their work. This could be related to background, training, or the kinds of problems they have worked on given their experience. It is also important that we don't compartmentalize so much that we miss opportunities to approach our problem from a number of different angles (Leo Breiman's 'straight jacket') This is perhaps what happened in the Science article, once the problem was framed as a predictive modeling problem other modes of thinking may have shut down even if developers were aware of all of these distinctions.<br />
<br />
The take away is that we think differently when doing statistical inference/explaining vs. predicting or doing machine learning. Making the substitution of one for the other impacts the way we approach the problem (things we care about, things we consider vs. discount etc.) and this impacts the data preparation, modeling, and interpretation.<br />
<br />
For instance, in the Science article, after framing the problem as a predictive modeling problem, a pivotal focus became the 'labels' or target for prediction. <br />
<br />
<i>"The dilemma of which label to choose relates to a growing literature on 'problem formulation' in data science: the task of turning an often amorphous concept we wish to predict into a concrete variable that can be predicted in a given dataset."</i><br />
<br />
As noted in the paper 'labels are often measured with errors that reflect structural inequalities.'<br />
<br />
Addressing the issue with label choice can come with a number of challenges briefly alluded to in the article:<br />
<br />
1) deep understanding of the domain - i.e subject matter expertise<br />
2) identification and extraction of relevant data - i.e. data engineering<br />
3) capacity to iterate and experiment -i.e. statistical programming, simulation, and interdisciplinary collaboration<br />
<br />
Data science problems in healthcare are wicked problems defined by interacting complexities with social, economic, and biological dimensions that transcend simply fitting a model to data. Expertise in a number of disciplines is required.<br />
<br />
<b>Bias in Risk Adjustment</b><br />
<br />
In the Science article, the specific example was in relation to predictive models targeting patients for disease management programs. However, there are a number of other predictive modeling applications where these same issues can be prevalent in the healthcare space.<br />
<br />
In <i>Fair Regression for Health Care Spending,</i> Sherri Rose and Anna Zink discuss these challenges in relation to popular regression based risk adjustment applications. Aligning with the analytics lifecycle discussed above, they point out there are several places where issues of bias can be addressed including pre-processing, model fitting, and post processing stages of analysis. In this article they focus largely on the modeling stage leveraging a number of constrained and penalized regression algorithms designed to optimize fairness. This work looks really promising, but the authors point out a number of challenges related to scalability and optimizing fairness across a number of metrics or groups.<br />
<br />
<b>Toward Causal AI and ML</b><br />
<br />
Previously I referenced Galit Shmueli's work that discussed how differently we approach and think about predictive vs explanatory modeling. In the Book of Why, Judea Pearl discusses causal inferential thinking:<br />
<br />
<i>"Causal Analysis is emphatically not just about data; in causal analysis we must incorporate some understanding of the process that produces the data and then we get something that was not in the data to begin with." </i><br />
<br />
There is currently a lot of work fusing machine learning and causal inference that could create more robust learning algorithms. For example, Susan Athey's work with <a href="https://arxiv.org/abs/1902.07409">causal forests</a>, Leon Bottou's work related to <a href="https://leon.bottou.org/talks/invariances">causal invariance</a>, and Elias Barenboim's work on the <a href="https://www.pnas.org/content/113/27/7345">data fusion problem.</a> This work, including the kind of work mentioned before related to fair regression will help inform the next generation of predictive modeling, machine learning, and causal inference models in the healthcare space that hopefully will represent a marked improvement over what is possible today.<br />
<br />
However, we can't wait half a decade or more while the theory is developed and adopted by practitioners. In the Science article, the authors found alternative metrics for targeting disease management programs besides total costs that calibrate much more fairly across groups. Bridging the gap in other areas will require a combination of awareness of these issues and creativity throughout the analytics product lifecycle. As the authors conclude:<br />
<br />
<i>"careful choice can allow us to enjoy the benefits of algorithmic predictions while minimizing the risks."</i><br />
<br />
<b>References and Additional Reading:</b><br />
<b><br /></b>
<a href="https://www.stitcher.com/podcast/amjepi/casual-inference">This paper was recently discussed on the Casual Inference podcast.</a><br />
<b><br /></b>
Measures of Racism, Sexism, Heterosexism, and Gender Binarism for Health Equity Research: From Structural Injustice to Embodied Harm—an Ecosocial Analysis. Nancy Krieger<br />
<br />
Annual Review of Public Health 2020 41:1<br />
<b><br /></b>
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1, 206–215 (2019) doi:10.1038/s42256-019-0048-x<br />
<br />
Breiman, Leo. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist. Sci. 16 (2001), no. 3, 199--231. doi:10.1214/ss/1009213726. https://projecteuclid.org/euclid.ss/1009213726<br />
<br />
Shmueli, G., "To Explain or To Predict?", Statistical Science, vol. 25, issue 3, pp. 289-310, 2010.<br />
<br />
Fair Regression for Health Care Spending. Anna Zink, Sherri Rose. arXiv:1901.10566v2 [stat.AP]<br />
<br />
<br />
<br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-50904689475980498472019-09-30T21:12:00.001-04:002020-07-27T16:37:36.747-04:00Wicked Problems and The Role of Expertise and AI in Data ScienceIn 2018, <a href="https://science.sciencemag.org/content/sci/360/6390/728.full.pdf">an article in Science </a>characterized the challenge of pesticide resistance as a wicked problem:<br />
<br />
<i>“If we are to address this recalcitrant issue of pesticide resistance, we must treat it as a “wicked problem,” in the sense that there are social, economic, and biological uncertainties and complexities interacting in ways that decrease incentives for actions aimed at mitigation.”</i><br />
<div>
<br /></div>
<div>
In graduate school, I worked on this same problem, attempting to model the social and economic systems with game theory and behavioral economics and capturing biological complexities leveraging population genetics. </div>
<div>
<br /></div>
<div>
<b>Wicked vs. Kind Environments</b></div>
<div>
<br /></div>
<div>
In data science, we also have 'wicked' learning environments in which we try to train our models. In the EconTalk podcast with Russ Roberts, Mastery, Specialization, and Range, David Epstein discusses wicked and kind learning environments:</div>
<div>
<br /></div>
<div>
<i>"The way that chess works makes it what's called a kind learning environment. So, these are terms used by psychologist Robin Hogarth. And what a kind learning environment is, is one where patterns recur; ideally a situation is constrained--so, a chessboard with very rigid rules and a literal board is very constrained; and, importantly, every time you do something you get feedback that is totally obvious...you see the consequences. The consequences are completely immediate and accurate. And you adjust accordingly. And in these kinds of kind learning environments, if you are cognitively engaged you get better just by doing the activity."</i></div>
<div>
<i><br /></i></div>
<div>
<i>"On the opposite end of the spectrum are wicked learning environments. And this is a spectrum, from kind to wicked. Wicked learning environments: often some information is hidden. Even when it isn't, feedback may be delayed. It may be infrequent. It may be nonexistent. And it maybe be partly accurate, or inaccurate in many of the cases. So, the most wicked learning environments will reinforce the wrong types of behavior."</i></div>
<div>
<br /></div>
<div>
As discussed in the podcast, many problems fall within some spectrum ranging between very kind environments like Chess to more complex environments like self driving cars or medical diagnosis. What do experts have to offer where AI/ML falls short? The type of environment determines to a great extent the scope of disruption we might be able to expect from AI applications.</div>
<div>
<br /></div>
<div>
<b>The Role of Human Expertise</b></div>
<div>
<br /></div>
<div>
In Thinking Fast and Slow, Kahneman discusses two conditions for acquiring skill:</div>
<div>
<br /></div>
<div>
1) an environment that is sufficiently regular to be predictable</div>
<div>
2) an opportunity to learn these regularities through prolonged practice</div>
<div>
<br /></div>
<div>
This sounds a lot like the 'kind' environments discussed above. Based on research by Robin Hogarth, Kahneman also makes these distinctions describing 'wicked' environments as those environments in which those with expertise are likely to learn the wrong lessons from experience. The problem is that with wicked environments, experts often default to heuristics which can lead to wrong conclusions. Even if aware of these biases, social norms often nudge experts into the wrong direction. Kahneman gives an example involving physicians:</div>
<div>
<br /></div>
<div>
<i>"Generally it is considered a weakness and a sign of vulnerability for clinicians to appear unsure. Confidence is valued over uncertainty and there is a prevailing censure against disclosing uncertainty to patients...acting on pretended knowledge is often the preferred solution."</i></div>
<div>
<br /></div>
<div>
This likely explains many of the mistakes and low value care that are problematic with healthcare delivery as well as dissatisfaction with both the quality and costs of healthcare. How many of us want our physicians to pretend to know what they are talking about? On the other hand, how many people are willing to accept an answer from their physician that rhymes with "let me look this up and get back to you later." </div>
<div>
<br /></div>
<div>
One advantage AI may have over experts in kind environments is as Kahneman puts it, the opportunity to learn through prolonged practice. Machine learning can handle many more training examples than a human so to speak.<br />
<br />
Even in kind environments, an expert may swing and miss when dealing with cases where the correct decision is like a pitch straight over the plate. One reason Kahneman discusses in Thinking Fast and Slow is the idea of 'ego' depletion. This is related to the idea that mental energy can become exhausted after significant exertion. As self-control breaks down, its easy to default to heuristics and biases that can lead to decisions that look like careless mistakes. This would certainly apply to physicians given the number of stories we hear about burnout in the profession. </div>
<div>
<br /></div>
<div>
The solution seems to be what polymath economist Tyler Cowen suggested several years ago in the econtalk podcast discussion he had about his book <b>Average is Over </b>with Russ Roberts:</div>
<div>
<br /></div>
<div>
<i>"I would stress much more that humans can always complement robots. I'm not saying every human will be good at this. That's a big part of the problem. But a large number of humans will work very effectively with robots and become far more productive, and this will be one of the driving forces behind that inequality."</i></div>
<div>
<br />
Imagine the clinical situation where a physician's 'ego' is substantially depleted from a difficult case. They could then lean on AI to prevent mistakes treating more routine decisions that follow. Or perhaps leveraging AI tools, a clinician could conserve additional mental energy throughout the day so that they are less likely to default to heuristics when they encounter more complex issues. The way this synergy materializes is uncertain, but it will certainly continue to involve substantial expertise on the part of many professionals going forward. Together human expertise and AI might have the greatest chance tackling the most wicked problems.</div>
<div>
<br /></div>
<b>References:</b><br />
<br />
Wicked evolution: Can we address the sociobiological dilemma of pesticide resistance? | Science https://science.sciencemag.org/content/360/6390/728.full<br />
<br />
Thinking Fast and Slow. Daniel Kahneman. 2011<br />
<br />
EconTalk:David Epstein on Mastery, Specialization, and Range<br />
https://www.econtalk.org/david-epstein-on-mastery-specialization-and-range/<br />
<br />
EconTalk: Tyler Cowen on Inequality, the Future, and Average is Over<br />
https://www.econtalk.org/tyler-cowen-on-inequality-the-future-and-average-is-over/<br />
<div>
<br /></div>
Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-84832189097768528022019-05-15T21:25:00.001-04:002019-05-15T21:45:47.745-04:00Causal Invariance and Machine LearningIn an <a href="http://www.econtalk.org/cathy-oneil-on-weapons-of-math-destruction/#audio-highlights">EconTalk podcast </a>with Cathy O'Neil Russ Roberts discusses her book Weapons of Math Destruction and some of the unintentional negative consequences of certain machine learning applications in society. One of the problems with these algorithms and the features they leverage is that they are based on correlational relationships that may not be causal. As Russ states:<br />
<br />
<i>"Because there could be a correlation that's not causal. And I think that's the distinction that machine learning is unable to make--even though "it fit the data really well," it's really good for predicting what happened in the past, it may not be good for predicting what happens in the future because those correlations may not be sustained."</i><br />
<br />
This echoes a theme in a recent <a href="https://p-hunermund.com/2019/04/04/beyond-curve-fitting/">blog post by Paul Hunermund:</a><br />
<br />
<i>“All of the cutting-edge machine learning tools—you know, the ones you’ve heard about, like neural nets, random forests, support vector machines, and so on—remain purely correlational, and can therefore not discern whether the rooster’s crow causes the sunrise, or the other way round”</i><br />
<br />
I've made similar analogies <a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html">before </a>myself and still think this makes a lot of sense.<br />
<br />
However, <a href="https://iclr.cc/Conferences/2019/Schedule?showEvent=1142">a talk at the International Conference on Learning Representations</a> definitely made me stop and think about the kind of progress that has been made in the last decade and the direction research is headed. The talk was titled: <i>'Learning Representations Using Causal Invariance'</i> (you can actually see it here: <a href="https://www.facebook.com/iclr.cc/videos/534780673594799/">https://www.facebook.com/iclr.cc/videos/534780673594799/</a>):<br />
<br />
Abstract:<br />
<br />
<i>"Learning algorithms often capture spurious correlations present in the training data distribution instead of addressing the task of interest. Such spurious correlations occur because the data collection process is subject to uncontrolled confounding biases. Suppose however that we have access to multiple datasets exemplifying the same concept but whose distributions exhibit different biases. Can we learn something that is common across all these distributions, while ignoring the spurious ways in which they differ? This can be achieved by projecting the data into a representation space that satisfy a causal invariance criterion. This idea differs in important ways from previous work on statistical robustness or adversarial objectives. Similar to recent work on invariant feature selection, this is about discovering the actual mechanism underlying the data instead of modeling its superficial statistics."</i><br />
<br />
This is pretty advanced machine learning and I am not an expert in this area by any means. The way I want to interpret this is that this represents ways of learning from multiple environments that prevent overfitting in any single environment such that predictions are robust to any spurious correlation you might find in any given environment. It has a flavor of causality because the presenter argues that invariance is a common thread underpinning both the works of <a href="http://econometricsense.blogspot.com/2013/05/selection-bias-and-rubin-causal-model.html">Rubin </a>and <a href="http://econometricsense.blogspot.com/2014/04/how-is-it-that-structural-equation.html">Pearl.</a> It potentially offers powerful predictions/extrapolations while avoiding some of the pitfalls/biases of non-causal machine learning methods.<br />
<br />
Going back to Paul Hunermund's post I might draw a dangerous parallel (because I'm still trying to fully grasp the talk) but here goes. If we used invariant learning to predict when or if the sun will rise, the algorithm would leverage those environments where the sun rises even if the rooster does not crow, as well as instances where the rooster crows, but the sun fails to rise. As a result, the biases that are merely correlational (like the sun rising when the rooster crows) will drop out and only the more causal variables will enter the model – which will be invariant to the environment. If this analogy is on track this is a very exciting advancement!<br />
<br />
Putting this into the context of predictive modeling/machine learning and causal inference however, these methods create value by giving better answers (less biased/robustness to confounding) to questions or solving problems that sit on the first rung of Judea Pearl’s ladder of causation (see the intro of <a href="http://bayes.cs.ucla.edu/WHY/">The Book of Why</a>). Invariant regression is still machine learning and as such does not appear to offer any means to make statistical inferences. However at the same time <a href="http://econometricsense.blogspot.com/2016/03/machine-learning-and-economics.html">Susan Athey</a> is doing really cool stuff in this area .<br />
<br />
While invariant regression seems to share the invariance properties associated with causal mechanisms emphasized in Rosenbaum and Rubin’s potential outcomes framework and Pearl’s DAGs and ‘do’ operator, it still doesn’t appear to allow us to reach the 3rd rung in Pearl’s ladder of causation which allows us to answer counterfactual questions. And it sounds dangerously close to the idea he criticises in his book that <i>"the data themselves will guide us to the right answers whenever causal questions come up"</i> and allow us to skip the <i>"hard step of constructing or acquiring a causal model."</i><br />
<br />
I’m not sure that is the intention of the method or the talk. Still, its an exciting advancement to be able to build a model with feature selection mechanisms that have more of a causal vs. merely correlational flavor<br />
<br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-86635922852132811652019-04-23T23:43:00.000-04:002019-04-25T06:14:15.961-04:00Synthetic Controls - An Example with Toy DataAbadie Diamond and Hainmueller introduce the method of synthetic controls as an alternative to difference-in-differences for evaluating the effectiveness of a tobacco control program in California (2010).<br />
<br />
<a href="https://diff.healthpolicydatascience.org/#synth">A very good summarization </a>for how this method works is given by Bret Zeldow and Laura Hatfield at the healthpolicydatascience.org website:<br />
<br />
<i>"The idea behind synthetic control is that a weighted combination of control units can form a closer match to the treated group than than any one (or several) control unit (Abadie, Diamond, and Hainmueller (2010)). The weights are chosen to minimize the distance between treated and control on a set of matching variables, which can include covariates and pre-treatment outcomes. The post-period outcomes for the synthetic control are calculated by taking a weighted average of the control groups’ outcomes. Many authors have extended synthetic control work recently (Kreif et al. 2016; Xu 2017; Ferman, Pinto, and Possebom 2017; Kaul et al. 2015)"</i><br />
<br />
Bouttell and Lewsey (2018) provide a nice survey and introduction to the method related to public health interventions.<br />
<br />
For a very nice tour of the math and example R code see <a href="https://thesamuelsoncondition.com/2016/04/29/more-public-policy-analysis-synthetic-control-in-under-an-hour/comment-page-1/">this post</a> at The Samuelson Condition blog.<br />
<br />
<br />
<b>A Toy Example:</b><br />
<b><br /></b>
See below for some completely made up data for this oversimplified example. But let's assume that we have some intervention in Kentucky in 1995 that impacts some outcome 'Y' in years 1996,1997, and 1998, maybe we are trying to improve the percentages of restaurants with smoke free policies.<br />
<br />
Perhaps we want to consider comparing KY to a synthetic control based on the pool of states including TN, IN, CA and values of covariates and predictors measured prior to the intervention (X1,X2,X3) as well as pre-period values of Y.<br />
<br />
Using the package Synth in R and the data below the weights used for constructing synthetic controls using states TN, IN and CA with KY as the treatment group are:<br />
<br />
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style><br />
<table border="1" cellpadding="0" cellspacing="0" dir="ltr" style="border-collapse: collapse; border: none; font-family: arial,sans,sans-serif; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col></colgroup><tbody>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"w.weights"}" style="border-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">w.weights</td><td data-sheets-value="{"1":2,"2":"unit.names"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">unit.names</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":0.021}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.021</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":0.044}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.044</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":0.936}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.936</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td></tr>
</tbody></table>
<br />
We could think of the synthetic control heuristically being approximately 2.1% of TN, 4.4% CA, and 93.6% IN. If you look at the data, you can see that these wieghts make intuitive sense. I created the toy data so that IN looked a lot more like KY than the other states.<br />
<br />
As an additional smell test, if I constructed a synthetic control using only CA and TN, changing one line of R code to reflect only these two states:<br />
<br />
<span class="pl-v" style="background-color: white; box-sizing: border-box; color: #e36209; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;">controls.identifier</span><span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;"> </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;">=</span><span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;"> c(</span><span class="pl-c1" style="background-color: white; box-sizing: border-box; color: #005cc5; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;">2</span><span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;">,</span><span class="pl-c1" style="background-color: white; box-sizing: border-box; color: #005cc5; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;">3</span><span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;">), </span><span class="pl-c" style="background-color: white; box-sizing: border-box; color: #6a737d; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;"><span class="pl-c" style="box-sizing: border-box;">#</span> these states are part of our control pool which will be weighted </span><br />
<br />
I get the following different set of weights:<br />
<br />
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style><br />
<table border="1" cellpadding="0" cellspacing="0" dir="ltr" style="border-collapse: collapse; border: none; font-family: arial,sans,sans-serif; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col><col width="100"></col></colgroup><tbody>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"w.weights"}" style="border-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">w.weights</td><td data-sheets-value="{"1":2,"2":"unit.names"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">unit.names</td><td data-sheets-value="{"1":2,"2":"unit.numbers"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">unit.numbers</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":0.998}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.998</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":0.002}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.002</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td></tr>
</tbody></table>
<br />
<br />
This makes sense because I made up data for CA that really is quite a bit different from KY. It should contribute very little as a control unit used to calculate a synthetic KY. (in fact maybe it should not be used at all)<br />
<br />
The package allows us to plot the trend in outcome Y for the pre and post period. But we could roughly calculate the synthetic (counterfactual) values for KY by hand in excel and get the same plot using this small toy data set.<br />
<br />
For instance, the value for KY in 1998 is .51 but the counter factual value created by the synthetic control is the weighted combination of outcomes for TN, CA, & IN or .021*.41 + .044*.95+ .936*.46 = .48097.<br />
<br />
Using this small data set with only 3 states being part of the donor pool these results are not perfectly ideal, but we can see roughly that the synthetic control tracks KY's trend in the pre-period and we get a very noticeable divergence in the post period.<br />
<br />
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAlgAAAFzCAYAAADi5Xe0AAAgAElEQVR4Xuydd3wVRdfHf+kFAmmEEAIkgDQVUeSxYMGCgICgdBDpHQSk9y4d6UgHkQ6KFBF4kKLio6JiQZAioYQSSIGQQhKS9zOTNzEJCXd37969e27O/CMmM2dmvufM7C+zszNOGRkZGeDEBJgAE2ACTIAJMAEmoBsBJxZYurFkQ0yACTABJsAEmAATkARYYHEgMAEmwASYABNgAkxAZwIssHQGyuaYABNgAkyACTABJsACi2OACTABJsAEmAATYAI6E2CBpTNQNscEmAATYAJMgAkwARZYHANMgAkwASbABJgAE9CZAAssnYGyOSbABJgAE2ACTIAJsMDiGGACTIAJMAEmwASYgM4EWGDpDJTNMQEmwASYABNgAkyABRbHABNgAkyACTABJsAEdCbAAktnoGyOCTABJsAEmAATYAIssDgGmAATYAJMgAkwASagMwEWWDoDZXNMgAkwASbABJgAE2CBxTHABJgAE2ACTIAJMAGdCbDA0hkom2MCTIAJMAEmwASYAAssjgEmwASYABNgAkyACehMgAWWzkDZHBNgAkyACTABJsAEWGBxDDABJsAEmAATYAJMQGcCLLB0BsrmmAATYAJMgAkwASbAAotjgAkwASbABJgAE2ACOhNggaUzUDbHBJgAE2ACTIAJMAEWWBwDTIAJMAEmwASYABPQmQALLJ2BsjkmwASYABNgAkyACbDA4hhgAkyACTABJsAEmIDOBFhg6QyUzTEBJsAEmAATYAJMgAUWxwATYAJMgAkwASbABHQmwAJLZ6BsjgkwASbABJgAE2ACLLA4BpgAE2ACTIAJMAEmoDMBFlg6A2VzTIAJMAEmwASYABNggcUxwASYABNgAkyACTABnQmwwNIZKJtjAkyACTABJsAEmAALLI4BJsAEmAATYAJMgAnoTIAFls5A2RwTYAJMgAkwASbABFhgcQwwASbABJgAE2ACTEBnAiywdAbK5pgAE2ACTIAJMAEmwAKLY4AJMAEmwASYABNgAjoTYIGlM1A2xwSYABNgAkyACTABFlgcA0yACTABJsAEmAAT0JkACyydgbI5JsAEmAATYAJMgAmwwOIYYAJMgAkwASbABJiAzgRYYOkM1J7m/v77b1SpUgUtW7bE5s2bZVM++eQTdOjQAT4+Pti6dSvq168vf37mzBk88sgj8t8DBw7E3Llz5X/nzJljzy5w3UyACTABJsAEHIIACyyHcGNmJ7IEVosWLbBlyxZ89tlnaNasmRRXhw4dQs2aNfHRRx/hgw8+QN26dbFv3z6cOHECTz31FIKCgqToKl68uAMR4a4wASbABJgAE7APARZY9uFuk1pzCqyuXbuiXr16ucSVqDQtLU0Krd9//x3r1q3DokWL8L///Q+ff/45mjZtmqtdb7zxBqKjo3Hs2DF4eHhgz549UpyNGDFC2p42bZoUcd7e3nj99dfl6pfIV1ASq2fXrl3D999/L8ts27YNo0aNwoQJEyBE4YcffoiFCxfK4p07d8bEiRPh5uaGgwcPYubMmVIQCiE4evRo9OvXDz/++CPat28P0U4hKEWbxIodJybABJgAE2AC9ibAAsveHtCx/iyBFRoaiitXrkjLI0eOxJQpU3LVcvz4cdSqVSv7Z02aNMGOHTseaIkQP0L07N69Gw0bNpSvHsVrxnPnzmHFihVSYI0bNw5nz57Fhg0bMGPGDAwZMqTAHglhJspk2WvevDm2b9+OCxcuSBElRKFYTQsICMCBAwekqOrRowcqVqwobbZt21bWExUVhZiYGPz222945ZVX5O/Kly+P1q1bP9BXHfGyKSbABJgAE2ACigmwwFKMyvwZswRW3pZGRkYiJCQk14/FSpR4XSjS5cuXIURZ3nTq1ClUq1ZN7uGaN28efH198eKLL+Lo0aPo2LEj1q5diy5duqBRo0aoUKGCzOvi4lIgqF9//VUKqJ49e8rVLrGK9eyzz8oVrRdeeAHfffcd/vnnHymwRHvKlCkjRdQff/whV7KSkpIwfPhwfP311/LnQmQJgfXqq69KgZaeng5nZ2fzO4pbyASYABNgAg5PgAWWA7k4p8DatGkTfvrpJ8yePVuu/Kxfvz5XT4VoqV69ulz12bhxY4EU/vOf/0g7K1eulGJq1apV6NSpE06ePInnnnsO8fHxsqx4dSfqEK8KH5YqV64sXxOKOoUwW7BgAfr27YuSJUvKlam8SYimyZMny9WsrLpEHrEKJ/5fCKz8VukcyK3cFSbABJgAEyBIgAUWQacV1OQsgdW4cWPs3LlT7p8KDw+XQkSs8IiVnqwkBNJjjz2Wr/jKaX/58uXo3r273Msl7MTGxsqVrBs3bshN9bdu3cKuXbuwZs0a1K5dG99+++1DiYpXhOJVYaVKleSm+qzVMyG2xB4vUV7UJWyHhYUhNTVV2hV9EmLs448/lq8Zf/75Z9y5c0cKLCHAxOtMTkyACTABJsAEzEKABZZZPKFDO/J+RShMzpo1S+6LEoJGrFq5u7vLmpQKLCGo/P39ZZmcK2FZr/TEXqwiRYqgTZs2csO52GT+xBNPSGGX376u8+fPZ++pynrdKGxnHRMhXkfWqFFDHhkhNr6L1a2XX35Z7gHr1q2bFHtipeubb76RG/aFwJo6dap8dciJCTABJsAEmIBZCLDAMosndGhHfgIrMTFRih0hSsTrQrH3KqfAevfdd+XXhA9LQjyJV4579+7NPkdL1CX2Uh0+fFgWFStM4jWeeAXo5OQk91qJVab8UtZrxyVLlkgbIonVKPHloNj0LpIQX+LLQPHqUWyuz/q5OHZC/Ft8bShW4OrUqcMCS4fYYRNMgAkwASagLwEWWPryLHTWhIATSWxYV5KE0BPHRIivHMVrRiGgciaxkV28FixWrFiun8fFxcmVMrHZnRMTYAJMgAkwAbMTYIFldg85UPvEuVbiWAeRxKs/saeKExNgAkyACTABRyTAAssRvWrSPomjFcRrRrF5Xbzq49UokzqKm8UEmAATYAJWE2CBZTVCNsAEmAATYAJMgAkwgdwEWGBxRDABJsAEmAATYAJMQGcCLLB0BsrmmAATYAJMgAkwASbAAotjgAkwASbABJgAE2ACOhNggaUzUDbHBJgAE2ACTIAJMAEWWBwDTIAJMAEmwASYABPQmQALLJ2BsjkmwASYABNgAkyACbDA4hhgAkyACTABJsAEmIDOBFhg6QyUzTEBJsAEmAATYAJMgAUWxwATYAJMgAkwASbABHQmwAJLZ6BsjgkwASbABJgAE2ACLLA4BpgAE2ACTIAJMAEmoDMBFlg6A2VzTIAJMAEmwASYABNggaVjDGRkZMDJyUmzRWvLa66YCzIBJsAEmAATYAK6EmCBpQPOdevWYfv27UhKSkLdunUxcOBAuLi4PGC5Y8eOiI6Ozv55vXr10LdvX/z+++9YvXo1/vzzT1SoUAFt2rTByy+/rEPL2AQTYAJMgAkwASZgDwIssKykfuTIEUyfPh2zZs2Cq6srhg8fjpYtW6J169a5LMfHx+Odd97B1KlT4evrK39XrFgxBAQEoEWLFnjrrbfQqlUrfPvtt5gxYwa2bt2anc/KJnJxJsAEmAATYAJMwGACLLCsBC4ElZ+fH4YNGyYtrVmzBseOHcOyZctyWf7ll18wevRobNmyBRcuXEDFihXh5eWF2NhY7N69W4ord3d3JCYmokmTJpgwYQKef/55K1vHxZkAE2ACTIAJMAF7EGCBZSV18drvlVdeQYcOHaSlvXv3YtWqVXIFKmfasGGDfA3o7OyM9PR0eHp6YuHChShXrlyufJs2bZL5Nm/ezCtYVvqGizMBJsAEmAATsBcBFlhWkm/fvj0aNWokV6BEOnToEObMmYNdu3blsvzZZ59BrGKJFa87d+7IFa8nnngCgwcPzs4nVr7GjRuHPn36oGnTpqpb9vPPP6suwwWYABNgAkzAXARq1qxprgZxazQRYIGlCdu/hYQYqlWrFsRKlkjidZ9YvVq7du1DLQvBtWLFCnz55Zcy3+HDhzFlyhS89957EKJNSxICi+LAvHXrFgIDA7V02e5lmLmxLmDexvIWtTFzY5lT4X3mzBk0aNBAxkfWvmLxHOvSpQtGjBiBr7/+Wm55Ec9DNzc3CTElJUXuUS5VqhSWLFliLFg71MYCy0rokydPRlpaGsaPHy8tLVq0CBcvXpQb1XMm8epPBGH9+vXlj9evXy/F2MaNG+XGdrHnSoi0du3aaW4RlYGZt4MssDS7XHNBqsw5xjW7XHNBZq4ZnaaCVHjnFVhCUHXr1g2TJk1C27ZtERERgddee01+KS++rBdp5syZEF/dHzx4ECVKlNDEh1IhFlhWeuuHH37A2LFjZVB5e3tL5d67d2+p7MUrP/GzGjVqYNu2bXJvlli1unfvnnxVKPZuCUHVvHlzPPnkk9mrYKJJISEh8itDNYnKwGSBpcartsnLAss2XAuySpW36A/PK8bGChXeOQXWTz/9hJ49e2Lu3Llo3LhxNjAhpsTig3hjI/Yei2fd4sWLIY4oKgyJBZaVXhaHg4pXe+K4BpFq166dvZolXh+KpVDx9WBycjImTpwoJysRaJUqVZJHNogl1QULFjzQCrE3S20QUhmYLLCsDDodilN94HOM6+B8lSaYuUpgVma3Je/DJ+KwbNdVxCfdx6CWZVCnRuaRQVpSlsASzzWxyNCjRw8MHTo0l6n79+/L1Szxlkf8Wzz38r7d0VI3lTIssHTyVEJCgvxCUBy98LAkhFZqaip8fHx0qvlfM7YcmLo3NodBqg97/uvellGRv22OcWaulADVeUVLjD/dQ9kHThkZQNZlIzn/bYnp8aUPbrrPEliirDgY+/jx49i/fz+Cg4Nzmct6VSh+Lr6yV/tmxlLbzPx7Flhm9o7KtmkZmCqrsEl2qhMhCyybhMNDjXKMM3OlBKjOK1piXJPAAqD0YreHCSyxr0p8SS9eDYo3NuKYobxXxom9WZUrV8711bxSP1LOxwKLsvfytF3LwDRD96lOhCywjI8ejnFmrpQA1XnFljEuXhEuFa8IE+9jcCt9XhFmfUX422+/ydtKxIdf4rq3nEkIrKpVq+KDDz5Q6j6HyMcCyyHcmNkJWw5MW2KiOhEyc1tGRf62OcaZuVICVOcVKjGe3zENYn/V0qVL5RENOQ/RZoGlNGo5n2kJUBmYeQFSnQhZYBk/FDjGmblSAlTnFSoxnp/ASkpKkl/QBwUFydtIsl4Vig3wVapUyT6uQakPqefjFSzqHszRfioDkwWW/YOOHz7G+oAqb/4jwtg4oczbeFLmr5EFlvl9pLiFLLAUo9ItIzPXDaUiQ8xbESZdMzFzXXFaNEaVt8WOFcIMLLAcyOlUByb/dW98EFJlzjFufKwwc2OZU+VtLCUatbHAouEnRa2kOjCpPuwpL+dTZc4xrmgq0DUTM9cVp0VjVHlb7FghzMACy4GcTnVgUn3Ys8AyfvBwjDNzpQSozitUY1ypXwpTPhZYDuRtqgOT6kTIAsv4wcMxzsyVEqA6r1CNcaV+KUz5WGA5kLepDkyqEyELLOMHD8c4M1dKgOq8QjXGlfqlMOVjgeVA3qY6MKlOhCywjB88HOPMXCkBqvMK1RhX6pfClI8FlgN5m+rApDoRssAyfvBwjDNzpQSozitUY1ypXwpTPhZYDuRtqgOT6kTIAsv4wcMxzsyVEqA6r1CNcaV+yZkvLS0Nt2/fRkBAgJbipi/DAsv0LlLeQKoDk+pEyAJLeWzqlZNjXC+Syu0wc+Ws9MhJhXd+V+UcPnwYXbp0wYgRI+R9hImJidi6dSvc3NwkmpSUFLRs2RKlSpXChAkTcOfOHXk59LRp06TIysqnB0cz2GCBZQYv6NQGKgMzb3dZYOkUACrMUGXOMa7CyTplZeY6gVRohgrvvAJLCCpxqfOkSZPQtm1bRERE4LXXXkPfvn2z7yCcOXMm1q1bh4MHD+Lbb7/Fxx9/jHPnzuHVV1/F6NGjc10QrRCXqbOxwDK1e9Q1jsrAZIGlzq+2yM0CyxZUC7ZJlTev0hobJ5R45xRYP/30E3r27Im5c+eicePG2dCEmBo/fjw+++wzpKeno3nz5li8eDHq1auH+Ph4+f9lypTBY489hgEDBhgP28Y1ssCyMWAjzbPAMpJ2Zl3M3FjmzNtY3hzjjsU74egexKycgfT42wgcMAVFXmqouYNZAmvixIkYO3YsevTogaFDh+ayd//+fbmaJfZaiX9XqlQJM2bMkHnEz8Qq1lNPPYVTp07hmWee0dwWsxZkgWVWz2hoFz98NECzsggztxKgyuLMWyUwHbIzcx0gqjChhff555VuEs8A4JTZmowMwOn//22hfRWORT+QI0tgiV+8/PLLOH78OPbv34/g4OBcebNeFYqf7927F8WKFVNBg3ZWFli0/Zer9VoGphm6z69PjPcCVeYc48bHCjM3lrkW3vYUWGJfVaNGjeSrQbF5ffXq1XDKI9zE3qzKlStj8ODBxsK0c20ssOzsAD2r1zIw9axfqy2qD3vRX2au1evayjFvbdysKcXMraGnvqwtectXhCumI/2ueEX4oS6vCEV7fX198dtvv+Gdd96RXwW2adMmV8eFwKpatSo++OAD9UAIl2CBRdh5eZtuy4FpS0wssGxJN3/bVJlzjBsfK8zcGObxifcxe8tlHPw5Gs9U88WsXhWMqVhjLfkd0yD2Vy1dulQe0VCuXLlsyyywNELmYuYhwBOh8b5g5sYyZ97G8uZVWuN47zoWjQlrI7IrFAKrTg1f4xqgsqb8BFZSUhIaNGiAoKAgbN68OftVodgAX6VKlezjGlRWRTY7r2CRdd2DDRerEpyYABNgAkyAFoHTl1Mw8dMoxNy9T0Zg0SJsn9aywLIPd5vUyn/d2wTrQ40yc2OZM29jefMKlm15i9eCc7ddwRff3ZIf9ZUKcEfc3RQ8U9UXs3ub+xWhbck4hnUWWI7hR9kLfvgY70xmbixz5m0sb55XbMdbvBKct/0K4u6moaSfG0a3D8NzjxYjO4/bjhRdyyyw6PrugZbzw8d4ZzJzY5kzb2N5s8DSn3fE9WRMXBuB3/9JkMY71g9Gt0Yh8HDLPJOKaozrT4q+RRZY9H2Y3QOqA5PqF22UJ0OqzDnGjZ+wmLk+zO+lZmD57qtY89V1abB6+SIY2yEMYcGeuSqgylsfSo5lhQWWA/mT6sCk+rBngWX84OEYZ+ZKCZhpXvn+5B1MXheBG7Gp8C3qigHNQ9HoufxPX6ca40r9UpjyscByIG9THZhmmgjVhgMzV0vMuvzM2zp+Wkozcy3UMsvcjEvF9I2XcPhEnPz/pi8Eon+zUPh4uxRolCpv7ZQctyQLLAfyLdWByQLL+CCkypxj3PhYYebamH964AaW7rqKpHvpCC/liXEdwvBYeBGLxqjyttixQpiBBZYDOZ3qwKT6sOdXhMYPHo5xZq6UgL3mlT8vJMgDQy9cS4anuzN6NA5B+zdKKm02b3JXTMr8GVlgmd9HilvIDx/FqHTLyMx1Q6nIEPNWhEnXTMxcGU5xptX8z67g828yD3x++QlfDG9bFiV83ZQZ+P9cVHmr6mQhycwCy4EcTXVg2usvTT1cz8z1oKjcBvNWzkqvnMzcMsk9/4vGR1sfPNPKcskHc1DlraWvjl6GBZYDeZjqwGSBZXwQUmXOMW58rDDzgplfuXkPY1ddyD7T6r03SqLHW6Wzz7TS4i2qvLX01dHLsMByIA9THZhUH/YidJi5sQOIeRvLm2O8YN6Ld0Ri1d7MM60eDfPGhE7hD5xppcVbVGNcS18dvQwLLAfyMNWByQLL+CCkypxj3PhYYea5mf90Oh4TP4nAtegUFC/ign7vhMrjF/RKVHnr1X9HssMCy4G8SXVgUn3Y81/3xg8ejnFmrpSA3vPKrdupmLnpMg7+Eiub0Pj5AHmmlTg4VM9ENcb1ZOAotlhgOYon+XWVXTxJdTLU++FjFHzmbRTpf+th5sDGg1FY8kUkEu+lo1xJD4ztEI4nKlg+00qLt6jy1tJXRy/DAsuBPEx1YFJ92PMKlvGDh2OcmSsloMe8cvpSIsavicC5yCR5plXXhqXk5cy2TFRj3JZMqNpmgUXVc/m0m+rA1GMitJcbmbmx5Jm3sbwL6x8Rd5PuY+Hnkdh25KYE/my1YhjdvhyC/d1t7gCqMW5zMAQrYIFF0GkFNZnqwGSBZXwQUmXOMW58rBQ25nt/iMFHWy8jJj4NQb5uGNK6LF550tcw8FR5GwaIUEUssAg5y1JTqQ5Mqg/7wvrXvaU4tOXvOcZtSTd/24WFuTjTSlxx8+vZuxJEu9dLoleTEPlq0MhElbeRjKjUxQKLiqcUtJPqwGSBpcC5OmehypxjXOdAUGCuMDD/eOdVrNhzTdIQZ1qN6xiO8qU8FdDRPwtV3vqToG+RBRZ9H2b3gOrApPqw5xUs4wcPxzgzV0pAybyS80wrH+/MM63eeVG/M62UtjVnPqoxrqWvjl6GBZYDeZjqwFQyEZrVTczcWM8wb2N5O+ofEeJMq9lbLuPA8cwzrd58xh8ftCyj+5lWWrxFNca19NXRy7DAMsDDGRkZcHJysnlNVAcmCyybh8YDFVBlzjFufKw4GvNNX2eeaZWQnI7QEh7yihtbnWmlxVtUeWvpq6OXYYFlQw9fvnwZM2bMwOnTpxEYGIgxY8agWrVqD9T43//+F/Pmzcv18zVr1iAgIEBV66gOTKoPe0f9615V0BmcmWPcYOAOdIBxzjOtPNyc0PnNUujyZinjgVqokWqMmw6kCRrEAstGTkhPT0f79u3xxBNPoGvXrti8eTN27dqFHTt2wN0991kqs2fPxo0bN9C9e/fs1oSHh8PFxUVV66gOTBZYqtysS2aqzDnGdXG/KiPUmYvT1xd+dgVbDmeeafV0ZR+M7RCGkADbn2mlCvT/Z6bKW0tfHb0MCywbefjChQtSMG3ZsgV+fn5ITU3Fm2++ialTp+Lpp5/OVavIV79+fTz++OPw8vJCaGioplZRHZhUH/a8gqUpTK0qxDFuFT5NhSky//tyEr759TqKFi2C1XuvIfpOGgKLu2FwqzJ4vaafJg5GFaLI2yg21OphgWUjj/3www8YPXo0Dhw4kF3DO++8g549e+KNN97I/pnYnyXElUhi1UskIcQGDhyoumVUByYLLNWutroAVeYc41a7XrUBasw3HIzCnC2XZT8zAIjdry3qlEDft0NRxNPYM61Uwyb8SlZLXx29DAssG3n4u+++w8SJE7Fv377sGtq0aYNWrVqhadOm2T+Lj4/HsGHD0LJlS7zwwgv44osv8PHHH2evfKlpnpgIOTEBJsAECjOBkRuA6MyzQmXqVx94rCwtIjVr1qTVYG5tvgRYYNkoMP7880+5CpVzBUsIq6FDh+L5558vsFaxitWoUSN06dIFzZo1U9U6an9pZnWO6mqKaD8zVxWiVmdm3lYjVG2ACnNxptWH6y/iUtQ9uWolkrg7cPfUx1X32Z4FqPC2JyMqdbPAspGnoqKi0K5dO6xbtw7BwcFITExEkyZN5OpUhQoVsmuNjIzE6tWrMWjQILn/Ki0tDQ0bNsSIESNQp04dVa2jOjBZYKlysy6ZqTLnGNfF/aqMmJ15bHyaPNPqqx9jZL/q1fJH0xcCceLMLTSsXdq0m9kLcoLZeasKnkKemQWWDQOgY8eOCAkJkeJp5cqV+Omnn7Bx40a4urrKrwnFZveSJUtC7M0Sq1bvvfee/Npw/fr12LlzpxRcahLVgUn1Yc8rWGqiU5+8HOP6cFRjxczMtx6+iUU7InE36b4800p8HfjUI0Vl96jOK2bmrSZuOC/AAsuGUXDy5EkMGTJEfkHo4+ODsWPHokaNGrLGunXrYvDgwahXrx4OHTqEZcuWyQnB2dlZCrKcG+GVNpHqwKQ6EbLAUhqZ+uXjGNePpVJLZmR+9kqSvJhZnG0lUvfGIejeKPeZVlTnFTPyVhornC83ARZYNo4I8ZVgTEyMokNDo6Oj5ZEOQmRpSVQHJtWJkAWWlii1rgzHuHX8tJQ2E3NxptXiHZEQp7GL9OQjRTGuQ5hcvcqbqM4rZuKtJV64zL8EWGA5UDRQHZhUJ0IWWMYPHo7xwsv8vz/HYtbmyxD3CAYUc8XAFmVQ/z/+BQKhOq9QjXHjI9P8NbLAMr+PFLeQ6sCkOhGywFIcmrpl5BjXDaViQ/ZmfjU6BRPXRuD43/GyzeJMqz5NS6Oo18NvuqA6r9ibt+LA4IwWCbDAsoiITgaqA5PqRMgCy/ixwTFeuJiv2HMNq768hpS0DFQs7YXxHcNQpay3IghU5xWqMa7IKYUsEwssB3I41YFJdSJkgWX84OEYLxzMfzl7V65aXbl5T56+3qtJabR+NUhV56nOK1RjXJVzCklmFlgO5GiqA5PqRMgCy/jBwzHu2MzFmVZztl7G3h8yz7QS9waK+wPFPYJqE9V5hWqMq/VPYcjPAsuBvEx1YFKdCFlgGT94OMYdl/n2ozex8PNIxCfeR6kAd4x9Lwy1qvho7jDVeYVqjGt2lAMXZIHlQM6lOjCpToQssIwfPBzjjsf8n2vJGLvqQvaZVl0blkLPt0Ks7ijVeYVqjFvtMAc0wALLgZxKdWBSnQhZYBk/eDjGHYd5ckrmmVYbDlo+00pLr6nOK1RjXIuPHL0MCywH8jDVgUl1ImSBZfzg4Rh3DOZf/xqHWZsuISouFf4+mWdaNXim4DOttPSa6rxCNca1+MjRy7DAciAPUx2YVCdCFljGDx6OcdrMr8ekYPK6i/jfX3dkR5q9VAL93rF8ppWWXlOdV6jGuBYfOXoZFlgO5GGqA5PqRMgCy/jBwzFOl/nqvdexYs9V3EtVf6aVll5TnVeoxrgWHzl6GRZYDuRhqgOT6kTIAsv4wcMxTo/5b+cTMHHtBVy8cQ/eHs7o2aQ02r6m7kwrLb2mOq9QjWuNDi8AACAASURBVHEtPnL0MiywHMjDVAcm1YmQBZbxg4djnA7zuLtpmLvtCnZ/Hy0b/eqTvhjapqymM6209JrqvEI1xrX4yNHLsMByIA9THZhUJ0IWWMYPHo5xGsx3fHsL87dfwZ3E+yjp54bxHcOtOtNKS6+pzitUY1yLjxy9DAssB/Iw1YFJdSJkgWX84OEYNzdzcabVhDUXcDIiUTa0U4NgdG0YAg83J8MbTnVeoRrjhjuYQIUssAg4SWkTqQ5MqhMhCyylkalfPo5x/VgqtaSEuTjT6uOdV/HpgRvSbPXyRTC2QxjCgj2VVqN7PqrzihLeusNigzYhwALLJljtY5TqwKQ6EbLAMj7OOcbNx/zwiTjM3HQJN2JT4VvUFQOah6LRcwHGNzRPjVTnFaoxbneHm7ABLLBM6BStTaI6MKlOhCywtEaq9nIc49rZaS1ZEPO8Z1o1fSEQ/ZuFwsfbRWtVupajOq9QjXFdnecgxlhgOYgj+WFvH0dSnQz54WNsvFDlXdC8snbfdSzffQ3i1WB4KU+M6xCGx8KLGAvVQm1UmVOdU0zlfJM0hgWWSRyhRzOoDkyqEyGLWj2iVp0NjnF1vPTInZN5zjOtvDyc0aNxCN6tW1KPanS3QXVeoRrjujvQAQyywHIAJ2Z1gerApDoRssAyfvBwjBvLPO3aJfzxxx+o+EJ9fLT1MnYeyzzTqk4NXwxrUxYlfN2MbZCK2qjOK1RjXIVrCk1WFlgO5GqqA5PqRMgCy/jBwzFuHPMTowbB59AaWeH+oHewsmRPeabV6PZheO7RYsY1RGNNVOcVqjGu0U0OXYwFlgO5l+rApDoRssAyfvBwjBvH/M8XysArPfM8qwTnIvhm8Pd2O9NKS6+pzitUY1yLjxy9DAssB/KwmFA4MQEmwASsIZD0/UHELpyE8xmlUP3uL9LUyaI18NyyZXAqWtwa01xWIYHAwECFOTmbmQmwwDKzd1S2jepfPlT/0uQVLJUBqkN2jnEdIBZgIi0yAmfH9IP76WMyx54SzVHE9T7SU5IReO8G/lPRE6UX77JdA3S2THVeoRrjOrvPIcyxwHIIN2Z2gurApDoRMnPjBw/HuP7MM5ITcWXxNKRsWySNX3cvhb/qjkKL/i3kwaG/HPsWJZaMQMr5v1C0bjOUnLBM/0bYwCLVeYVqjNvAheRNssAi78J/O0B1YFKdCFlgGT94OMb1ZX73wHZcmjUaHvFRSHbyxKEK7fHiuDF4osK/Z1oJ5jXCQnG5Qx3cj4lCQK+x8G3fX9+G2MAa1XmFaozbwIXkTbLAIu9CFlj2dCHVyZAfPsZGjdl4p/xzGhHj+sDl/AkJ4gffl4BOE9C2RfUHwGTFeMq5k7jS9Q1kpCSj1MwN8K5dz1iIKmszG3Olzac6pyjtX2HKxwLLgbxNdWBSnQh5Bcv4wcMxbh3z9Lt3cH3hRCTtXC0NXfUIxdHnRqLL4LcR7O+er/GczBO/24drQ9rCyd0TpZfshkfVJ61rkA1LU51XqMa4DV1J1jQLLLKue7DhVAcm1YmQBZbxg4djXDvzO198gmsLJsA1MQ6Jzt74qlxHPPnBB3i9pt9DjeZlHrduHqKXTIRzcX+UWXsErkEh2htlw5JU5xWqMW5DV5I1zQKLrOtYYJnBdVQnQ374GBs99uR97/QJXJnQF7h4Snb6G9/XcLf5CHRu9wS8PZwtgsgvxm+M6w6xf8utXCWUWfVfOHmZ6x5C0Sl7MrcI9SEZqM4p1vTZUcuywHIgz1IdmFQnQl7BMn7wcIwrZ54eF40b88ci8atNstBFj3AcqDkMnQc0wiOhXooNFcQ8sns9JP95HF616iBk3nbF9ozKSHVeoRrjRvmVUj0ssCh5y0JbqQ5MqhMhCyzjBw/HuDLmcZuW4OayaXBOvou7LkWxI7QzqvbojZZ1SigzkCNXQczT4+Pkl4Vp1y+jeIvuCBw4VbVtWxagOq9QjXFb+pKqbRZYVD2XT7upDkyqEyELLOMHD8f4w5kn//odrk39AOlXzsmMX/vVx9UGg9Gn/WMILK7tYuaHMU+9fB6XO72GjMR4lBg8E8Xe6Wx8UBRQI9V5hWqMm8bxJmoICywTOcPaplAdmFQnQhZY1kas+vIc4/kzS4u6ilvzRiHh0E6Z4ZxXZeyoNgid+7yBWlV81INWsIKVlUWIusg+b8n/DZm7DV7/ecWq+vQqTHVeoRrjevnNkeywwHIgb1IdmFQnQhZYxg8ejvEHmceunonotfPglJKE2y7FsbFUF5Rv1wm9mujzdZ8S5vG71yPqw/fh5OktN727hVU2Pjjy1Eh1XlHC2+5wuQGKCLDAUoSJRiaqA5PqRMgCy/hxwTH+L/OEb/Yiau5IpF+7JH+4z/8tnHq+L4Z1ewyhJTx0c45S5tGLxiNu/QK4lAhB2bWH4ewboFsbtBiiOq8o5a2FCZcxlgALLGN527Q2qgOT6kTIAsum4ZyvcY5xQFzKHDV9IJKOH5WMznhXxeZHBqFFh5fQ8Fn9RY0a5teGtEHid/vhUaWGPIjUyUP514p6RxPVeUUNb72ZsT19CbDA0penXa1RHZhUJ0IWWMaHe2GOcXEpc8yqmYj7dL4EH+vqj/XB3RDQuBX6NwuFj7eLTRyihnnGvSR5nY64GNq79hsoNXOjTdqkxCjVeUUNbyUcOI/9CLDAsh973WumOjCpToQssHQPYYsGC2uM392/DbcWjsP9W9clo92BzfBT9W4Y3qUaHgu37SGfapnfj76RfTG033sD4N9zjEW/2iID1XlFLW9bsGOb+hBggaUPR1NYoTowqU6ELLCMD/vCFuPiUmaxefzeXz9L2H8WqYFPw/qjcYtn8N4bJQ1xgBbmOS+GLjl+KYq+0dyQtuashOq8ooW34XC5QkUEWGApwkQjE9WBSXUiZIFl/LgoLDEuLmWOWToZt7evlJCj3QLxSamewHONMK5DGEr4ajvTSovHtDLPuhha1Fl60U54PllbS/Way1CdV7Ty1gyKC9qMgKkE1rJly/D666+jfPnyNuuwIxumOjCpToQssIwfTYUhxu/sWIvopZORfjtGAv68RBt8W7kDBrR9BHVq+BoO3RrmcRsWInrhODh5+6DM6oNwK1PBsPZTnVes4W0YXK5IEQFTCaxZs2ZhyZIlWLBgAd58801FHeBM/xKgOjCpToQssIwffY4c4+JSZvE6ULxeE+lE0aexqnRf1G34JHq+FQJPd8sXM9vCI9Yyz7oY2jW4DMqI4xt8jBGJVOcVa3nbIgbYpjYCphJY6enpWLx4MT766CN06NABw4cPh7u7u7aeFcJSVAcm1YmQBZbxg8wRY1xcynxrwVjE7828lPmGezA+KdUbSdVfxbiO4ShfytN40Dlq1IN5ZO/GSD5xDJ6PPY3Sy/YZ0h+q84oevA0BzJVYJGAqgZXV2kOHDqF///7yVaEQXCEh+pxIbJEG8QxUBybViZAFlvEDxtFiXFzKHLNiurzLL8XJHTuCWmNfuQ7o905pNHtJ/cXMtvCIHszTE+JxpdOrSL3yD4rWbYaSE5bZoqm5bFKdV/TgbXO4XIEiAqYUWKLlUVFRaNmyJWJiYuS+rCeeeAKPPvoonnrqKTg722ep3BLRjIwMODk5WcpW4O+tLU91YFKdCFlgaQ51zQUdJcbF/X1RUwdIwSHSD8VqY22p3qj1QmUMalkGfj6umhnpXVAv5mnXLuFyxzpIj78N/+4j4ddxkN5NdQiB9dumVXiitXkuzbapkxzcuOkElnhNuG/fPrkP6++//0b79u2RnJyMX3/9FefOncOJEyfg42Pd5aV6+3TdunXYvn07kpKSULduXQwcOBAuLgUf+jd16lRERERg6dKlsinnz5+X//7tt99QuXJldO3aFdWrV1fdTL0mQtUVW1mABZaVADUUp8qceozLS5nnjkTC4V3Sa1c9QrEypB/iwp/F2A5heOqRohq8adsiejJP/uMnRPaoLxscPGU1irySeUm0LRK1GJcCtMPLEF+QOhcthjJrj8C1VFlboGGbBhEwlcA6duwYJkyYIIVUly5dpNAICgrKRhEfH4+iRYtatUqkN9cjR45g+vTpEBv0XV1d5b4xsfLWunXrfKsSfRw3bpx8/ZklsDp27IjHHnsMffr0webNm7Fz505s27ZN9UqdnhOh3pweZo/aRJizL9SYi0k8+uPJSL52Gf5N2sOnYVsjXW11XdR4iw7f3rwEMRuXwNXLGykRZwAnJ9xz9sS2oHexs0RL9Ggcgm6NSlnNxlYG9GYe/+VGRE3uCyd3D5ResgceVZ+0SdOpzSuxy6ciZvWsbBaB/SejeKteNmHDRo0hYCqBNWrUKHh7e0thVbKkMYfoWYtZCCo/Pz8MGzZMmlqzZg2EiBJHTuRNcXFxckWuVq1aiIyMzBZYTZs2RatWrdCmTRscPXoUkyZNwq5du+DpqW5zq94TobVslJanNhFSFlhi9eT2lsyVU5HC9/0DZ5/iSl1l13xXo1Pw6a4/0OS1aqhcxn533KmBIF6HXaiX49iZjAwc860jz7QKq1YOEzqFIyTA3B/y2GJeifl4EmI/mQvn4v4os+qgTVZqKM0rsWtmI2bpFCm+s1KZNYfhXulxNeHGeU1GwFQC6+TJk3KfFaUkVp9eeeUV+dWjSHv37sWqVauwdevWB7oxePBg+QpQvOIUG/mzVrD279+PmTNn4pFHHsHZs2fRokULdO/eXTUGW0yEqhuhoQCliTBv96gx/1/vLihxYkd2N2LL1sKpBmOR6B+mwXPGFbl1Ow2ff3Mzu8LWrwbZ7O49vXpV7MZpVP5yPIpfzzx2QaQffWpj9eOT0L95qE0uZtar7Tnt2CrGrw9/DwlH98AtrDJCl++DcxF9t35QmFfEl5VyL97l83Dy9EbRF+sh+vZdhNZ9i9zqsi1ij7pNUwmsoUOHonbt2mjSpEm+XP/8809UqlTJVEc3iBWpRo0ayRUokYRwmjNnjlyBypnEa7+NGzdC7NcSr/9yCizRb7H/6rnnnsOPP/4o91+JVSw3N3WnNYuJkBMTeBiBbQuPosk/y1AiNQpHfOuiasLvCEq9gQN+DbE5uBPiXYuZEmAGgJyfj2Rk5Ppj31Rt9k+7hTbXVuCluK9x0rs6LnpVQJ3YfYhyD8aWKoPRvn1FeJl70coQnk6p91B01vtwuXoBqRUeR8KA2YbUa4ZKnBPuwPOzpXD/8YBsTuqTLyGpeW+kF/OX/1+zZk0zNJPbYCUBUwmsHTt2YNCgQVi/fj2effbZ7K6JLwqFaBGrQmbb5C72TYlXfmIlS6Tdu3fLdq5duza7/eLVoFiVEhvgq1atiu+//16uVIlVqmrVqsmyYlN/lSpVEB0dLfdviY3wTz/9tCr32uovTVWN0JCZwl+aBXWLEvPklHTMbjsCnwe0lN0p6umMMU6fIuzHNZmTvHtRXHy+GyKeM98XTBRWsFxSEhH2/UqU/WEtXO7fk0wjnumEMSltkZImJCLkNTeNnw/QMErsV8SWMZ7zYmixHzBo1ALdOmrWeeXOF58gevEEpMfHQRy+GjRyPryefim737bkrRtcNqSIgKkElmixEBrLly+XqzzlypWTe5pmzJghX62NHTs2l/BS1EMbZ5o8eTLS0tIwfvx4WdOiRYtw8eJF2easdPXqVSkcs9Lt27eRmpqKChUqoEePHhArWOLLyazjJxo3biw3+Yu9WWoS1YFp1olQCXtKzGdP2I6m+7rjzyJPIqLbUrR4rYzc/5N2IxLRC8bg7tdfyC67BocioNc4FK37jhIEhuURe7DW7foDTU24B+vOrk/lHpr7MVGZ4vXVJgjoNwmuJUsjPvE+dn1zGSElfe1y1Y21DrJ1jOe8GFrPjd1mm1dSLvwtN/ffO/WLdIk4pkIcV5E32Zq3tfHA5ZUTMJ3AEmdBjR49GgcOHJCvyMSXg2PGjMHbb78tv9IzW/rhhx+k8BOv9MQG/REjRqB3795o0KCB3OwuflajRo1czd60aVP2K0JxLIV4JdquXTu5yiWEljjJXgjM4sXVbT6mOjDNNhGqiTEqzFfvvQ7X+b3w/O2jSG05FIHvdkFgYGCurt479StuzhwMcWWLSB5VaqDEkFk2+8pLDeesvGbjLc+zmjEYqRfPPJQZx/jDvZ3zYuiQudvg9Z9XtIRHrjJmYZ6RnIiYFdMQt2GRbJ9njefkSp1b6fB8+2i2GLfaEYXYgKkEljjvSnw5J1aEevbsKUXIt99+i1KlzPsJsxCEU6ZMgTiuQSSxhyxrNUu8PhRtF4IxZ8opsMTPRR9XrFghvyz09/eXYuutt9SfD0N1YJplItQyD1Bg/t2ftzH2o5+w7FTmPsHwr84hJuX+AwIrq/93D2xH9JKJSLt+Rf4o52qMFkZ6ljEL79TIC/I8q8Tv9svuWVr14xi3HAVxGxfLlVSx2Tt02Vdwr2jdB09mYC428d+aMxzi/DNn3wAE9psEnwaZ47CgZJYYt+wxzmGJgKkElnhVdvjwYTzzzDOoWLEi5s+fj5dfflkeeWDG1auccBMSEuQrPi8v7Z+Pi71avr7aL0KlOjDNMBFaGihUJ8OI68lo/+EpNLy0Bi2iPoVP/ZYIGrsESpiLz+hjP/kIGYl3Zfd92/aFf+fBcPLW92svNeztHeNi30zMsqm4vX2FbLZg4d9hIHzb939oN5TwVsPByLxGMo+a0g/xezbAxT9IXgztEqD9uB57Mhev3aOmDUDSD19LVxVr8h4Ceo9XdCSKkbyNjKPCWJepBJY43fz48eP4/fff5X/FSe4ilSlTRooucV2OOMTT7GLLXoFEdWDacyK01ldmZn436T7aTf4LkbdSsOrvViiSEovQFfvhUa2mIoEl2KTfjkH00sm4syPzow1xbpF/1+Eo3qyLteg0lbcn77gNCyHOKxInbcuHZtOOCOgxSjKxlDjGLRH69/dZF0O7V6gm49XJQ9sfrfZiHrduHmJWzUTGvSR5BEXJ0QvkmFOa7BnjStvI+ZQRMJXAytvkxMREnDp1Cn/99Zf8elAEnjj+wGxX5ShDbftcVAemvSZCPTxiZua9PjqDn07H483Ub9Dh9CR5aKE4vFAktcxTL53DzY9GZP9F7lauEgL7ToB37Tf0wKjYhj14J3z9BW4tnoC0qxdlO72ffQ2BAz6EW9mKitutlrdiwwZkNJp5zouhRXyVmrlRUy+NZp78x48QK3BirIjXnOIPEd+2fVS33WjeqhvIBRQTMLXAUtwLzigJUB2YRk+EeoaLWZnP234F6/bfgIebE9YmjIDT38cRNHohfN5so0lgZTGTm7qnDZQHI4rkWeN5BA2dJf9SNyIZyTvlzB+ImjEI9/7KPF9OiErRV88na6vuKse4OmQ5L4YWIiWg70R1BjT8EaG6gv8vIFZ5by0aj/jd6+VPirzYACWGzIRLoLa9w0bGuNY+czllBFhgKeNEIhfVgckPH33Da//xWIxc/o80Oq9pBoIn1YOzjy/C92WKIpGsZX5nxxrELJ+K+7G3pD2fRu8ioNcYuPjl/jJR354Z80fE/VvXcGvheNzdv00238WvBPy7jUCxppm3NWhJ1vLWUqdeZew1r+S8GFqcFeXTqJ2qLhnBPH73pzJW0u/EwjUoBCVGzIP3M6+qamfezPbibVWjuXC+BFhgOVBgUB2YRkyEtnKz2ZifvpSId6eckt3t3jgEb/8xDeKMJrEJO6DXWN0EljCUkRgv75OL27QEGSn34ORVBH7v9oNvu35wcld3j6ZS/9iSd0ZSQmZ/Ni5GRkqyvIzYt3Vv+HUYKPtmTeIY10ZPiNwb43vIwqUX7VS1emhL5qkXzyLqw34QIlAk+QFI9xG6xL0tY1ybF7iUVgIssLSSM2E5qgPTlhOhrd1kJuax8WloM+kv3LqdKg+0nPFeEC7UzbxjsNznv8tDL7OSnszlQaWLx+Pugc+keZcSIQjoOdri5+hafGMr3nd2rkPMsg//PSi0bjME9pug+TVP3r7pyVsLN2vK2Iq50jaJA1xj186RX2yWWX0QbmUqKCpqC+ZCeMcsn4a49ZknzntUfUq+encP1+8Vub15K4LLmRQRYIGlCBONTFQHpi0mQqM8ZibmHaaewsmIRFQI8cTaEVWRvG0JoheORZGX3kTwtHW5kNiCed6DSt0feQxBQ2fD41F1Vz49zHd6837goNBqNWWbxQcBeiZb8NazfUYy19Lua0PayDPHxNUyZVYdlGdKWUp6M0/84WvcnNo/80wrH18E9BmPYm+1t9QM1b/XO8ZVN4AL6EaABZZuKO1viOrA1HsiNNITZmE+fk0Edn8fjWLeLtgwphqC/d1xsenj8mEQMv8zeD39ss0FVlYF4sqd6EXjITYqi1TkpYYQV6C4liprtWv04i2+9Lo1bxQSv/+vbJNrSDkE9h6HIq/mf9G8tQ3nGLeOoDjyILJXI3nLgLhhIHTVQYsG9WIu9uTdnDUUCUe/lHX6NGiNwPcnKTqew2Ij88mgV4xrqZvL6EuABZa+PO1qjerA1GsitAd8MzDffCgKMzddlt1fMaQyalQsKv/aF3/1u5YOQ7mtmV/B5UxGMBevUeS5UQnxsurirXrKT9edi2g/qNRa3vJcr+VTceezVbJNzkWLwa/jYE2f06uJNyN4q2mPmrzWMldT18PypsdF41KHOrh/8yqK1GmM4A8zLykvKOnBXJx9FrNiOsR1N+JYDnHFjefj/9GrS/naMQtvm3aykBhngeVAjqY6MPWYCO3lRnszP3HuLrrOzDyQd3CrMmj9apD897WBLSBeaQR+MA3Fm3ezi8ASlT4gaHx85WnwxVv10uQya3g/IPiadZUbk8XrHlsnjnF9CKdG/I3LnV+Xgse/yzD4dRlqE4Eljua4MbkfRH3igw2/ToPg1+EDfTphwYo1MW5IA7kSxQRYYClGZf6MVAcmP3y0xdb1mBS0nfQX7iTeR6PnAjC+Y+aGdnFP3qUWT8sTsMP3npGHHuZNRjOXr+Tmj0bisQOyKW6h5RHQd7x8fagmaYnxvK8svWvXQ+CAKQVetqumPUrzGs1babuU5NPCXIldrXmSfjyEqwOay+LBU1ajyCv539uqhXl6/G25b1F8eSuSV80XIY6I0OP1ttL+mo230nZzvgcJsMByoKigOjC1TIRmcZu9mCenpENsaj9/NRmPhnnLTe1ZSewtur35Y3mdTeCgGfmishfzvJvKxeuWEoNmKN5Uroa3PCh02gC5b0ck9/JVUGLoHHhWf8bw8LEXbz06qoa5HvUpsSHiW8S5SKWXfgXPx2s9UEwt8/gvN+LWwnEQryJdAoNRYsCHNtuT97A+mpG3Ep9wHhZYDh0DVAem2onQTE60F/PBS87j8Ik4BBZ3w8Yx1eDn4yqxiFcnFxpVk2dUld16vMBVGnszz3ssgk+DVpkHlVo4/VoJb3lQ6IJxuHtgu2QiLgwWdwaqPahSzzizN29r+qKEuTX2tZbNuhja2ae4vAIq7yqTUuZ5z7Qq3qK7jBcn76Jam2ZVObPytqpThbQwr2A5kOOpDkylE6EZXWUP5iv2XMPHO69KHJ+OqooqZf99BShOWL85YxC8nn4JIfM/LxCZGZjLgz3XzUPchkWZB3t6eMnN5n7vvl/gwZ4P4y3trZ3z78Gn0l5f+L3XX/OFwXrFnBl4a+2LPWJcaVuzLoYWr5xDV3+d6wMKS8xFzMWumikPlxXJo1L1zDOtKj6qtHqb5DMzb5t02IGNssByIOdSHZiWJkIzu8ho5t/9eRv9F5yTSD7sVh5vPO2XC8+lts/LjbnB0z556P4mMzGXK06LJuDuvq3ZK07+3UeiWON3H3B9Qbzv7Fj7/1f33JRlxKf0AX3GwcU/c9O/vZOZeKtlYXSMq2lfzouhxb2YpRfvyi7+MOa5zrQq4gP/nqNRvFlXNVXbLK+Zedus0w5qmAWWAzmW6sDkh4+yIIy4noz2H55C0r10vFu3JAY0D81VUOxviuzzlrw7L2zP6YcaNSNzJZcr543xfPd0DZsD9/L/7klTRte2uczIW2mPzT6v5LwY2qdhW3mUgkj5MZdnWs0ejoQju2Weoq+/jcCB02x+h6ZS1iKf2Xmr6Uthz8sCy4EigOrA5IeP5SC8m3Qf7Sb/hchbKahVxQdLBlZ6oND10Z2R8PUX8q9xv/cGkhNYWQ1OOLwLt8RBpZER8kfez9dF4PuT5TlEWTEuv0qcOxKJ/8s8cFLrV4mWyeuTg2NcH44FWcl5MXRA3wny1XBe5uLOTHFBuXiVLE6EF18HilfpZktU53GzcTRDe1hgmcELOrWB6sDkh4/lAOj10Rn8dDoepQPdsX50NRT1cslVSPxlHvHWY/Jn4XvPWjxlmgLz25uXIGblDKTfvSP7JfbGJKWkwcvNBSnn/5I/E2dY+XcZiuItMy8ENmuiwLsgdlTmlZwXQ4tX5EnVnkFgYCDunTyOqKn9kfJP5qquX+eh8O86zKyhwitYpvWM+oaxwFLPzLQlqEyEeQHyw+fhITVny2VsOBgFLw9nrBtZFWHBng8UEH+Zx66eJfceBY1ZZDFGqTBPvxMrRdbtrUsBOOXoVwaKt+wpxZURB4VaBGohAxXe+XWD0rwiTl2PXTkdTq5ucKn+LNy9iyDxu32yW15PvYASw+YovizaWp9rLU+Jt9Y+FpZyLLAcyNNUByY/fAoOwv3HYzFy+T8yw7x+FVH7seL5Zo5oWBn3Y28hdOUBeFR9ymJUU2Me9+k8RC+emN0vcdGub7t+FvtplgzUeOfkRm1eufzeS0g5dzKzCxkZcPEvgYC+EyGOAqGQqPGmwNRebWSBZS/yNqiX6sDkh0/+wXD6UiLenXJK/rJXkxB0ebNUvhmzXo0ovQRXGKHGXJywHdmnsXxwileFpRftgjj/iEqixpuywLo+pC0S/n/VSvSj9KLd8HzyOSqhwq8IyXjKckNZYFlmRCYHCyzjXWUr5rHxaWgz6S/cHBnaDQAAIABJREFUup2KOjV8MatXhQI7F9m9HpL/PI6gMYsV/5VO9YFvK962jhyqvAUXaswTju7B9eHvSZcKMV7mk6O2dq+u9qnx1rXzDmaMBZYDOZTqwOSHz4NBKK7BORmRiAohnvIaHE9353wj9d6Z33Gl4ytyH1L4vvOKo5kqc45xxS7WLSNF5mLF8+Zfv6HkM+b7StCSYyjyttSnwvp7FlgO5Hnx0OREn8CHm27i6B+J8PFyxqK+pRDkm3kNTn4pcd4IpB78DB7Nu8PzvUH0O889YAJMQH79yIk+ARZY9H2Y3QOqf/lQXU0R4PVmvvFgFGZvuSx9umJIZdSoWPB9aOILuwv1K8q85T7/Ha4lSyuOZqrM9eatGJiVGanytkWMW4lScXGqzKnGuGLHFKKMLLAcyNlUBybViVDvh8+Pp+PR+6MzMiKHtSmLFnVKPDQ64z6dj+jFE+SVOOLcHzWJKnOOcTVe1icvM9eHo1IrVHkr7V9hyscCy4G8TXVgUn3Y6ymwrty8J78YFCe2N3ouAOM7hlmMzItNH0da1FV5qbPaE6mpMucYtxgWumdg5rojfahBqryNpUSjNhZYNPykqJVUBybVh71eAivxXjraT/kLF2/cw6Nh3nJTu6WU8M1eXB/2LlxLh6Hc1p8tZX/g91SZc4yrdrXVBZi51QhVGaDKW1UnC0lmFlgO5GiqA5Pqw14vgfX+/LM4dvIOAou7YeOYavDzKXhTe1a4Xu3fDEk/HUbgoBko3qyL6iimypxjXLWrrS7AzK1GqMoAVd6qOllIMrPAciBHUx2YVB/2egispbuuYvnuazIKPx1VFVXKeluMyNTIC7jU4mk4eXghfO8ZOHlaLpPXKFXmHOMWw0P3DMxcd6QPNUiVt7GUaNTGAouGnxS1kurApPqwt1ZgHT4Rh8FLMs+u+rBbebzxtJ8iP9/6aARub12G4s26InDQdEVlWGBpwqRbocIa47oB1GCIKnOq87gGFzl8ERZYDuRiqgOT6kRojcA6F5kEcZjovdQMvPdGSbzfLFRRJGYkJ+JCg0rIuJeEsluPw610uKJyLLA0YdKtUGGMcd3gaTRElTnVeVyjmxy6GAssB3Iv1YFJdSLUKrBuJ9xH20kncSM2FbWq+GDJwEqKo/DOZ6twc9YQeD39MkLmf6a4HAsszah0KVjYYlwXaFYaocqc6jxupbscsjgLLAdyK9WBSXUi1Cqwus78GyfO3UXpQHesH10NRb1cFEfhxRY1kRYZgeBp61DkpTcVl2OBpRmVLgULW4zrAs1KI1SZU53HrXSXQxZngeVAbqU6MKlOhFoE1qzNl7Hp6yh4eThj3ciqCAv2VByBST9/g6v9msI1KATldvyhuFx+Gaky5xi3yu2aCjNzTdg0F6LKW3OHHbggCywHci7VgUn1Ya9WYO3+Phrj10TIiJvXryJqP1ZcVfRdH9EBCUd2I6DXWPi276+qLK9gWYXL6sKFJcatBqWjAarMqc7jOrrOYUyxwHIYV+p/L55RaKhOhGoE1p8XEtBx2mmJtO/bpdGxfrAqvGk3InHx7eqyTPhX5+BcTNkXhwVVQpU51YcPVd5qYlxVQBuQmSpzqjFugEvJVcECi5zLCm4w1YFJdSJU+vC5dTsVrSf+hbi7aahTwxezelVQHXUxH09G7CcfwefNNggavVB1eV7BshqZVQYcPcatgmOjwlSZU53HbeRG0mZZYJF2X+7GUx2YVCdCpQKr3eS/8PflJFQI8ZTX4Hi6O6uOugv1KiA9Pg6haw7Bo1LmSpY1iSpzjnFrvK6tLDPXxk1rKaq8tfbXkcuxwHIg71IdmFQf9koE1ojl/+DA8VgU83bBhjHVEOzvrjri4vduRtSk3vCoVhOhK/arLp9fAarMOcZ1cb8qI8xcFS6rM1PlbXXHHdAACywHcirVgUn1YW9JYH164AbmbrsiI2zFkMqoUbGopmi70qUu7p36BUFjl8CnfktNNvIWosqcY1wX96sywsxV4bI6M1XeVnfcAQ2wwHIgp1IdmFQf9g8TWD+ejkfvj87I6BrRriyavVRCU6TdO/M7rnR8Bc4+vgjfl3mtjh6JKnOOcT28r84GM1fHy9rcVHlb229HLM8Cy4G8SnVgUn3YFySwrty8h3ennMLdpPto9FwAxncM0xxlUZP6IH7vJvh1+AD+PUZptsMrWLqh02TI0WJcEwSDC1FlTnUeN9i9JKpjgUXCTcoaSXVgUp0I8xNYiffS0X7KX7h44558JSheDWpN6XdicaF+RVk8bPcpuPgHaTX1QDmqzDnGdQsBxYaYuWJUumSkyluXzjuYERZYDuRQqgOT6sM+P4H1/vyzOHbyDkr6uWHDmEdRvIjya3DyhmLsJ3MR8/EkFKnTGMEfrtE1Uqky5xjXNQwUGWPmijDplokqb90AOJAhFlgO5EyqA5Pqwz6vwFryxVWs/PIaPNyc5HEMFUt7WRVdF5s+jrSoqwhZsANeNV+0ylbewlSZc4zrGgaKjDFzRZh0y0SVt24AHMgQCywdnZmRkQEnJyfNFq0tT3VgUn3Y5xRYh0/EYfCSzE3o4iBRcaCoNSnh6B5cH/4eXEuHodzWn60xlW9Zqsw5xnUPBYsGmblFRLpmoMpbVwgOYowFlg6OXLduHbZv346kpCTUrVsXAwcOhItLwa+Gpk6dioiICCxdulTWHhUVhQkTJuDMmTMICgpCx44dpR21ieLAvDV3JO7sXg+30mEoOWoh3Cs9rrbbds0vmBcProYOU0/hXmqGvAJHXIVjbbr6/ttIOn4UJQbPRLF3Oltr7oHyLLB0R/pQg1R55/wjwlhi1tdGlTnFedx6bzmmBRZYVvr1yJEjmD59OmbNmgVXV1cMHz4cLVu2ROvWrfO1fOzYMYwbNw7ly5fPFlg9e/ZEcHAwhg0bhh9//BGTJ0/Gpk2bEBAQoKp11AZm2rVLuNjsyew++jRojaAxi1T12d6Zjx77BdN3uuJGbCqef7QY5r//iNVNSo28gEstnoaThxfC956Bk6e31TbzGuCHj+5IWWAZi9RibRzjFhFxBhsTYIFlJWAhqPz8/KQ4EmnNmjUQImrZsmUPWI6Li0P79u1Rq1YtREZGSoF15coVdOrUCZ999hnS09Ph4eGBlJQUFC1aFM7O6q5UoSSwUv45hagp78sDNLOSZ/VnUPrjL630iHHFxWvBGRvO40YcEBbsgXWjqsHbQ53P8mvtrdlDcXv7ShRv0R2BA6fapEP88LEJ1gKNUuUtOkRpXsnpAKrMqfI2dkTRqI0FlpV+Eq/zXnnlFXTo0EFa2rt3L1atWoWtW7c+YHnw4MGoXLkyfHx8cOjQISmwjh8/jlGjRuGJJ57Ar7/+Ksv06NEDzZs3V90yCgMzPSEeMUun4Pa25bJ/rgFBuJ+aAsTfRkZ6OoKnrpVfzZk9DVp8Hkd+i8vsg4sTtox/FGWDPKxutuAT0bgaMpITUXbLT3ALLW+1zXxF3K1bCAwMtIltWxqlEOOOxJsFli2jOX/bVGPceFLmr5EFlpU+EitSjRo1QqtWraQlIZzmzJmDXbt25bK8c+dObNy4EWK/1rZt27IF1v79+zFz5kwpsPr06YOvvvoKO3bskOXd3dXdWycGppmT+4//hefny+B8Nw4Znt5IbtwJ915qIpvs9uf/UGTpWGS4uiNh4Bykla1k2q7EJwGD1/3bPB8vYFZ7fZrrcfQLeG1dhNQqNZHQxzarV/q0lK0wASZgKwI1a9a0lWm2ayABFlhWwhaiSLzyEytZIu3evVuuXq1duzbbsng12KJFC7lxvWrVqvj+++9x9uxZdO/eXb4KHDt2LJYvX46wsDCkpaWhYcOGGDlyJF5++WVVrTPrXz4pF/7GzWn9kfzHT7I/4j69gH6T4OKXuYKStZR/e/MS3Jo3Gs7F/VFm7RG4BoWo6r8RmbccvonFOyIRn3gfWR+MWntae852X2xRE2mRESg1Yz28X6hvsy7x6xOboc3XMFXeojNmnVcseZAqc6q8LfmjMP6eBZaVXhcb0oUoGj9+vLS0aNEiXLx4ETNmzMi2fPXqVQwaNCj7/2/fvo3U1FRUqFABQ4YMgdjkvn79evkFodiH1aBBA/nz119/XVXrzDYwMxLvInr5VNze/LHsh1tYZQSNmAvPx/+Tq185J8KoKf0Qv2cD3MpVQuiK/XAu4qOKga0yn72ShAlrI3D6UqKsovnLJeRJ7WfPXUDHpjXg4639QNGsNicdP4Kr778jhWW5HX/Yqiu5RK1NK7GBcbPFuNIuUn3Ys8BS6mH98lGNcf0IOI4lFlhW+vKHH36QK1CTJk2Ct7c3RowYgd69e0uRJDa7i5/VqFEjVy3iC8GsPVjiF23atMGjjz6KoUOHQrxKFHuzsgSXmuaZaWDe3bcVtxaMxf2YKDh5FYF/l2Hwbdsn3+7kffhc7d8MST8dhmeN51F6ce5XrWp46JFXXH0jVqw2fR0lzYnDQ8XdglXKZn7Zpyfz68PbI+HolwjoPQ6+776vR/MLtEH1ga8nb5sCzmOcKm+9Y5yZWyZANcYt96zw5WCBZaXPxeGgU6ZMgTiuQaTatWtnr2aJ14elSpXC6NGjHyqwTp8+LV8JxsfHyy8HBwwYIAWa2mSGgZn3dWDRV5sgcMAUuASWUvywz0hKwOXOryP14hn4NGyLoFEL1KLQJf+B47GYveUybt1OlV8H9mpSGm1ey30foF7M025E4uLb1WW7w786B+difrr0oSAjVB/4evG2Kdx8jFPlzQLL6EjR948241vPNeYkwAJLp3hISEiQ4sjLS/v1KNHR0fLIB7XHM2R1wZ4PH/E6MGbFNMRtWiKbI04gDxr2Ebyefski4fwePuKKmCudX5MrYAG9xsK3fX+LdvTKcDU6BRPXRuD43/HS5GtP+WFI6zIILO72QBV6MY9eMhFx6+YZJiipPvD14q1XrCi1Q5U3CyylHtYvH9UY14+A41higeU4vtT1dZUaLHf3b8t8HRh9Qx6O6dfxA/h1+ECxiYIePinnTuJK1zeQkZKM4CmrUeSVtxTb1JpxxZ5rWPXlNaSkZaBUgDvGvheGWlUK3gem12R4oV4FpMfHIXTNIXhUylzJsmWi+sDXi7ct2eZnmypvFlhGRwqvYBlP3HY1ssCyHVvDLRv98Em9eBZR0wci+cT3sq9FXmyAwA+mw7WkuqtiHvbwSfxuH64NaSvth678Lzyq/nvyu56Afzl7V65aXbl5T5rt3CAYvZta7ocezOO/3IioyX3h+djTKL1sn57dKtAW1Qe+HrwNAZynEqq8WWAZHy1UY9x4UuavkQWW+X2kuIVGDUz5OnDldMRtXCzb5hpcBiWGzYH3M68qbmvOjJYePqKe6AVjMo9vWHUQrqXKaqonv0Kx8WmYs/Uy9v4QI39dvXwRTOwcjtASyg4N1YP55Y51kHLmD5QcvxRF31B/wKwWGJaYa7FpRBk9eBvRzrx1UOXNAsv4aKEa48aTMn+NLLDM7yPFLTRiYN498BluLRiD+7euy3b5dRoC/27DFbcxv4xKHj7ZxzeElkfo6q91Ob5h25GbWPh5JO4m3YdvUVcMbBGKhs8ae//jvZPHcaVbPTj7+CJ833mrOKoprIS5GntG5TUixm3RF6q8WWDZIhoebpNqjBtPyvw1ssAyv48Ut9CWAzPv60Cvmi8iaOR8XVaTlD589Dq+Ie+ZVm+/GIj33wnVdJaVtcxvTOgJcaSFX8dB8O8+UrGvrc2olLm19ehd3lreerdHqT2qvFlgKfWwfvmoxrh+BBzHEgssx/GlTTa5izvxYlZMR9yGhZKUOAQzcMCHut4XqPThk/P4hqJ1m6HkhAcv1H6YO8WZVku+iMTGg5lnWoWX8sS4DmF4LLyI5iiwZjJMvxOLC/UryrrDdp+Ci3/uIyA0N0pBQaXMFZgyNIs1vA1taJ7KqPJmgWV81FCNceNJmb9GFljm95HiFuo9MO8e3CGvrrl/65psgzj80r/zEDh5Zh6yqVdS8/ARxzdc7vAy0m/HwL/HKMVfKx78JRazNl/GzbhUeHk4o0fjELxbt6TVXbCGeeya2YhZ9qH8OlJ8JWlkUsPcyHZZqssa3pZs2/L3VHmzwLJlVORvm2qMG0/K/DWywDK/jxS3UK+BKV4H3pw1BEk/fyPr9qzxnDzTyq3cI4rboiaj2ofPvVO/IrJXI0XHN1yPScHkdRfxv7/uyCbVqeGLYW3KooTvg2daqWlzVl5rmEc0rIL7sTdRetFOeD5ZW0v1msuoZa65Ip0LWsNb56aoMkeVNwssVW7WJTPVGNel8w5mhAWWAznU2oEpXweumom4T+dLKi4BJRH4/iSI13G2TFoePjmPbyi99Ct4Pl7rgSau2nsdK/dcxb3UDJT0c8Po9mF47tFiunZFK/OEw7twfWRHeT9j2Q3HdG2TEmNamCuxa+s8Wnnbul2W7FPlzQLLkmf1/z3VGNefBH2LLLDo+zC7B9YMzISvv8Ct+aMhXsGJVLxVTwR0GwEn76I2J6T14SP2hUUvHAdnn+Ios+Zw9ob7384nYOLaC7h4I/NMq471g9GtUQg83Jx074tW5lf7NkHSL9+ixNDZKNa0o+7tsmRQK3NLdm39e628bd0uS/ap8maBZcmz+v+eaozrT4K+RRZY9H1olcBKjbyAm9MG/vs68PFaKDF8HtzDKxtGxpqHT87jG3zm78XcvXex53/Rsu3iTKuxHcIQFuxps75omQzFfY2X2z0PJ28fhO/+S/c9bUo6aw1zJfZtlUcLb1u1RY1dqrxZYKnxsj55qca4Pr13LCsssBzIn2oGprh+JnbVTMR+MlcScPYNQGCf8fIuPKOTtQ+fyN6NkXziGM4XrYqR4fPkmVbvNwvFW8+rO9NKS7/VMM+yf3PmYNz5fDWKt+whv8i0R7KWuT3azA97+1DXEuP2aWnuWjnGzeCFwt0GFlgO5H+lE6HY/3Nr3iik3YiUvS/2dicE9BwjX7XZI1kzEf5zLRnTVvyJdge7ITTlEi5UqIdnlq7TdKaVlr4rZZ5lOz0hHhGNqiLjXhLKbj0Ot9LhWqq1uow1zK2u3AoDanlbUZWuRanyZlGraxgoMkY1xhV1rpBlYoHlQA63NDDzvg4UlwqXGDEXHpWfsCsFLQ+f5JR0fLzzKj49cEO2vXrxOxhxohuc78bCv9sI+HUabEifLDHP24jbW5bi1tyR8HrmVYR8tNWQNuZXiRbmdmtsjorV8jZDm0UbqPJmgWV8BFGNceNJmb9GFljm95HiFhY0MOXrwNWzEbt2jrQlrmUJ6DXGLpur9XjYHz4Rh5mbLuFGbCo83Z3RrVEpdKgXDHF8w5Uur8sqxLlS4nwpWye1k+HFFjWRFhmBUjM3wLt2PVs3r0D7VB/4annbDXCeiqnyZoFlfARRjXHjSZm/RhZY5veR4hbmNzATju6RKyZp169IOz6N2sm9VuLiZLMkpQ+fvGdaPVutGEa3L4dgf/fsriQc2onrozrJ/y/o+AY9+61mMkz64WtcHdhCnoZfbscfejZDtS2lzFUbtnEBNbxt3BRV5qnyZoGlys26ZKYa47p03sGMsMByIIfmHJhp1y4hasYgiIe6SO4VqiFIvA6sVtN0PVby8Fnz1XWs2HMN4tWgONNqSOuy8tDQ/JI4xyt68YQHjm+wRcfVTIbXhrSFOL8roO8E+Lbta4vmKLaphLliYwZmVMPbwGZZrIoqbxZYFl2rewaqMa47CAcwyALLAZyY1QUxMJ96/FHErv0IsatnyR87F/GRe5LEF2tmTQ97+OQ900pcb9PzrRD5avBhKefxDaErD8jXorZISidD8UHBxberyyaEH4iQfrFnovrAV8rbnmzzq5sqbxZYxkcS1Rg3npT5a2SBZX4fWWxhevxtRPZ9Cyln/wRcXYHUVMDJCUXfaI7A/lPg4hdo0YY9M+T38Im7m4Z5269g17HMM60eDfPGuI7hKF9K+ZlWWcc3eD72NEov22eTLiqdDMWBqOJgVPGKNmhk5kn59kxUH/hKeduTLQssc9DnGDeHHwpzK1hgOYD34/dsgFixyUou/iUQPHkVPGs8T6J3eSfCL767hfnbr+B2wn0U83aRZ1o1fUG9SBRHIlzp9CpSr/wjr/spOWGZ7jyUPvD/eT0MGYnxCF1zCOLrTXsnfvgY6wGqvHkFy9g4oczbeFLmr5EFlvl9ZLGFYiP79eHvZecLnvYJirzU0GI5s2TIevhEXE/GuNUXcDIiUTat0XMBGNA8VB4cqjWJvWiXO7+G9Nsx8O8yDH5dhmo1lW85JQIrfvd6RH34Pjwf/w9KL92ra/1ajVF94CvhrZWJLctR5U35gU+VOdUYt+X4oWqbBRZVz+Vod3zifRwZ0B9OESeREloNb69aRKZXS3ddxRff3kRqGhBzNw3itsByJT0wtkM4nqhQRJd+5Dy+oeT4pfLVqV5JyWR4qe3zSI34W66g2fribKX94oePUlL65KPKmwWWPv5XY0XJnKLGHue1HwEWWPZjr1vNYp/ShLURutkzylAGIAVVVsrIAPq+XRqdGgTr3gRbHd9gaTJM/uMnRPaoLzfZh+87r3u/tBqk+sC3xFsrD1uXo8qbBZatI+NB+1Rj3HhS5q+RBZb5fWSxhY4isN54qig+aGa7+wPvbVuG5E9mw6loMRSdtRXOIWEW2VqbIXHWIKQe3Q2PVn3g2e59a81xeSbABAoBgcBA9XtOCwEWcl1kgUXOZQ82WLwinL3lMg7+HI3/VPXF7N4VyPQq6xVh8aJumN27IkIC/j001BaduDGuO+4e2A7X4DIos/aw1cc3POyvzfsxUfLeQZHCdp+Ci3+QLbqkySbVFRWqf91T5c0rWJqGl1WFqMa4VZ120MIssBzIsVQHptEPHz2Pb3gY89jVMxGzfBqKvNpEftVppmQ0c736zjGuF0nldpi5clZ65KTKW4++O5oNFlgO5FGqA9Poh33O4xuK1GmM4A/XaI6ChzGPaFgF92NvovTiXaY7MsNo5poB5ynIMa4XSeV2mLlyVnrkpMpbj747mg0WWA7kUaoD0x4P+5zHN/h1GixPu9eSCmJ+9+svcGN0Z7iFVUbZDce0mLZpGXsw16NDHON6UFRng5mr42Vtbqq8re23I5ZngeVAXqU6MO31sNfj+IaCmEf2boTkE9+jxNA5KNa0g+mizF7MrQXBMW4tQfXlmbl6ZtaUoMrbmj47alkWWA7kWaoD054P+1zHNyzaCc8na6uKiPyYp1z4G5fbPQ8nbx+E7/4LTp7eqmwakdmezK3pH8e4NfS0lWXm2rhpLUWVt9b+OnI5FlgO5F2qA9PeD/vYtXMQs3SKFERlVh+EWxnlX2Hmx/zm9IG488UnKN6qFwL7TzZlhNmbuVYoHONayWkvx8y1s9NSkipvLX119DIssBzIw1QHphke9rmOb1h1EM6+ys7jystcbKC/UDfzfK2yW4/DrXS4KSPMDMy1gOEY10LNujLM3Dp+aktT5a22n4UhPwssB/Iy1YFplod91vENHlVqIHTVQUWRkZd53KYliJ4/Gt7PvIpSH21VZMMemczCXG3fOcbVErM+PzO3nqEaC1R5q+ljYcnLAsuBPE11YJrlYa/l+Ia8zC+2qIm0yAiUmrkR3rXfMG10mYW5WkAc42qJWZ+fmVvPUI0FqrzV9LGw5GWB5UCepjowzfSwl8c3dKyD9Pjb8OvwAfx7jHpohORknvi/g7j2QUu4BoWg3I4/TB1ZZmKuBhTHuBpa+uRl5vpwVGqFKm+l/StM+VhgOZC3qQ5Msz3ssy5oFqFRcvxSFH2jeYFRkpP5tcGtkXjsAAL6ToRv2z6mjiyzMVcKi2NcKSn98jFz/VgqsUSVt5K+FbY8LLAcyONUB6YZH/ZKj2/IYp52IxIX364uoyn8QASci/iYOrLMyFwJMI5xJZT0zcPM9eVpyRpV3pb6VRh/zwLLgbxOdWCa9WEfu3oWYpZPfejxDVnMoxeMQdzGxSjW+F2UGDHP9FFlVuaWwHGMWyKk/++Zuf5MH2aRKm9jKdGojQUWDT8paiXVgWnmh72l4xuymP/zehgyEuNRZv0xuIdXVuQve2YyM3NHfPhQ5S18wfOKsSOVKm9jKdGojQUWDT8paiXVgWn2h09k93pI/vM48ju+QTB/JPIv3Jw2AJ7Vn0Hpj79U5Ct7ZzI784L4cIwbHznM3FjmVHkbS4lGbSywaPhJUSupDkyzP+zT4+NwpUtdpF75B0XqNEbwh2uy/SGYl5jdD6kRf6PkhOUoWvcdRb6ydyazM2eBZe8I+bd+nleM9QVV3sZSolEbCywaflLUSqoDk8LDPufxDb7t+yOg11jpk9+2rEXRuR/Axa8EwvacVuQnM2SiwDw/ThzjxkcPMzeWOVXexlKiURsLLBp+UtRKqgOTysM+5/ENQSPnw6dRO5zq8zbcfz0Kv85D4d91mCI/mSETFeZ5WXGMGx89zNxY5lR5G0uJRm0ssGj4SVErqQ5MSg/7u/u34cb4HtIfJSetxI0xXeS/w3afgot/kCI/mSETJeY5eXGMGx89zNxY5lR5G0uJRm0ssGj4SVErqQ5Mag/72FUzELNiGpCR6RaPSo8jdO0RRT4ySyZqzLO4cYwbH0HM3FjmVHkbS4lGbSywaPhJUSupDkyKD/sLr4UiPSlJ+kWsXIkVLEqJInPBl2Pc+Chj5sYyp8rbWEo0amOBpaOfMjIy4OTkpKNFdaaoDkyKD/vL772ElHMnpYPcKz6KMp8cVecsO+emyJwFln2ChucVY7lT5W0sJRq1scDSwU/r1q3D9u3bkZSUhLp162LgwIFwcXEp0PLUqVMRERGBpUuX5soTGRmJzp07Y9q0aXjyySdVt4zqwKT4sE/65VvcXr8Ad65dQZnB0+H11Auq/WXPAhSZs8CyT8TwvGIsd6q8jaVEozYWWFb66ciRI5g+fTpmzZoFV1dXDB8wzE1gAAAQY0lEQVQ+HC1btkTr1q3ztXzs2DGMGzcO5cuXzyWw0tPT0bVrV1y+fBkzZsxggWWlX4wqTnUyZIFlVIRk1kOVN4taY+OEMm/jSZm/RhZYVvpICCo/Pz8MG5b5if6aNWsgRNSyZcsesBwXF4f27dujVq1aEKtVOVewVq5cifPnz8s9JryCZaVTDCzOAstA2LwHy1jY/18bx7ix2KnyNpYSjdpYYFnpp44dO+KVV15Bhw4dpKW9e/di1apV2Lp16wOWBw8ejMqVK8PHxweHDh3KFlinTp3CoEGDsGHDBrRq1YoFlpU+MbI41cmQ6ooK8zYyujPrYubGMqfK21hKNGpjgWWln8SKVKNGjaQwEkkIpzlz5mDXrl25LO/cuRMbN26E2K+1bdu2bIGVnJwsV7V69OiB119/HfXq1bNKYFnZHS7OBJgAE2ACdiZQs2ZNO7eAq9eDAAssKyn26dNHvvITK1ki7d69W65erV27NtuyeDXYokULuQG+atWq+P7773H27Fl0794dZ86cwVdffSX/LdL8+fPx1ltv4f/au/tYH+s/juNvIXcNhaY0yk0r95llolVuctNS6h/kD1EmY9FQUW4SoUkJWYwmTdLcNYQhN62kNRYmMzd/hDFaRSa2fnu923V2ziG/c77fz3XO9T3X8/rH79f5ns/3cz2uz7mu1/X5fK7P9dRTT1n9+vWzrB2/jgACCCCAAAKlIUDAylL97bfftqtXr9rEiRO9pLlz59qJEyd8onq0nTx50ocAo+3333+3K1euWKNGjaxZs2Y+ZyvaNHSjIcRhw4ZZp06dsqwdv44AAggggAACpSFAwMpSfffu3TZ+/HibPHmyVa1a1V5//XUbOnSo9ejRw4OT/lvr1q0LfMvnn39eYA5W/h9mM0SY5a7w6wgggAACCCAQSICAlSWkFhedMmWKabkGbR06dMjrzdLw4R133GFvvPEGAStLZ34dAQQQQACBXBIgYAU6WhcvXrSbbrrJqlSpEqhEikEAAQQQQACBXBUgYOXqkaPeCCCAAAIIIJBYAQJWYg8NFUMAAQQQQACBXBUgYOXIkdOTinoVT/7tRi+XzvRnOcIRezX16iK9uLvwy7szdS3tF4HHDhbgC0K28QDVKfNF0MZL/hCHbOOcU0r++BX3GwlYxRUrhc9rZXhNoo/W1tLipMuXL7c1a9b4cg89e/a0IUOGeBg4c+aMLVmyxDZv3my1a9f29bm0/pa26D2Hhw4d8p+9+eab1rRp01LYo2R/5blz59xNT4Q+9NBDXtn9+/fb0qVLfVVrLa+hVfkbN27sP9u0aZPpyVC9/kgLBI4dO9ZuueUW0wXso48+si1btpiOmZbd0Psma9asmWyAUqhdqDYeVT3bF6eXAkGJfmWoNq5Ka+FknZv+/vtvXxNQrw3TUjNsBQVCtXG9Uk2vWdu3b5+/GUTnlJYtW8KdQAECVgIPSlQlhSUtPKqlIO688868gKW1trQ46cyZM61atWp+sVeIGjhwoAcDLQ2hdblOnz7t629pTa5WrVr5ivH6V3+QCmhabX716tV28803J1ihZKu2fv16D0UKRJMmTfKApYVitVL/E0884b7r1q3zFfn1aiOFVQVVhbE2bdr4S7/Pnz9v8+bN80VndVL94IMP8pbwaNGihQ0fPrxkdyrB3xayjT/wwAO+pyFenJ5gsqyrFrKN79271wOVzkV33XWX31w0adLERo4cmXU9y0oBodu4zkHNmzc3PaWu87jeEqKQq4es2JIlQMBK1vEoUBvdpRw7dsxPWDt27MgLWApSumMZMWKEf14vmNZF/7PPPvMQMGHCBOvYsaP/TH+EOvH16dPHV4v/4osv/OXUUc/XO++8Y23btk2wQslVTRdmvfZIru+//74vr6GApYCr/71y5Uq/M1c3v9Y504Xll19+se+//94Dl7aff/7ZXnnlFT/xqderUqVK1q5dO/+ZgvEPP/xQYJX/ktu7ZH5TyDaukKstxIvTk6mVfa1Ct3G9FkznEwWq3377zW/4tFWuXDn7ypaREkK38aefftpv+Pr27evXBa3BqJtlzJPXYAhYyTsmeTVSl7t6l9TLtGrVqrwLs0KRVoOfNm2af1arye/cudM2btzor+RRmHr22Wf9Z88884wPZen/KyRo6DDa9DMNLT7++OMJVijZqil4VqxY0Yddo4B19uxZ69evn3388cd2zz33+DDsc8895z2BurgojKm3SneQUc/i/PnzfSgx2rSMh45L9+7dPfSy/SsQso2rpzbUi9PL8vEJ2cZ1LqpRo4bfTCi83X777fbhhx/abbfdVpYJi7Vvodu4piS8++67fuOtV67pnB+9aq1YFePDsQsQsGInzv4LCgesPXv2eFe8QoDuWvRzbQpYGt7S3Yzubo4cOeKryWvoRO82fOutt/wz0abP6E5Id0RsBQXyByz9RGHq0qVLbrVhwwafz6aTmnoMZauhV/UEqofwzz//tPwBS8ON6tVSUNNcFQ3hshUUCNHG1b5DvTg9DccnRBt/7bXXfAhdvYd16tTxc0w0lJ4Gw+LsY4g2rpuIMWPG+Pyr9u3be4+4RjPUi6UbQ7ZkCRCwknU8rlubwn+YenpEF2pd6LVSvC7uW7du9WGqCxcueI+K/gA130e9Kppw3aVLF+/Gz9+DpbCgP9ZoIncOUJRYFQtffI4fP+7zTPSuSM1308T1l156yYdi1U2vYany5ct7j6HmXOkBBAUphbKXX37Zf09DBboIsV0rEKKN6wLDi9OL3rpCtHH9DegcFPWm60EQnZc0XYHtxjcRmZzH1VulOVjqJbzvvvtMDyuoZ5ypHslsbQSsZB6XArUqfPFRz5S64++9917/nHpLjh496pPZdUdz9913e1d91POi9xs+8sgjPqylEFa3bl3766+/vOel8FBWDnCUSBXzX3zUA6V5VnoNki7iCk29evVyu+rVq7t9NM/qxx9/tHHjxnlPoYYGNKE9mvROuPrvQxeijeuhDl6cXvQ/jxBtXDdo+huIXgemBz80nWHFihVFr0hKPhmijWu6h8x1fokmtT/55JM2aNAgRiIS2I4IWAk8KIWrVPgPU0+NaGK7erE0sVTzqDRkqN4UvXhak7D1BJwuNlOnTvVQpcClOx89jagnC9XjoqHGZcuWXbO+Vg6QxF7F/BcfhVn1TCmg6t85c+bYTz/9ZIsXL/alGfTQgZ4arFevnk+Q19IX+nfBggU+ZKhjEc1J0Zy6hg0bxl7/XPuCUG08/37z4vQbt4IQbVw959OnT/e/CQUtDYVryEoPgLDduAcrk/O4ltfRjbHORerNUtCaNWuWP0WouXBsyRIgYCXreFy3NoUvPpqkqpCkCb26i1FvSjRxWk8d6m5SE7H1RI96UDp37uzlHjhwwEaPHu1PEOppOIWx1q1b54BAyVdRFx8tv6B5Dto0JKihV/VmKazqwQJNeNempwN1slQQ0+PT6q7X3DiVIev8m9z1NCLbjS8+mbZxAlbRW1aINq42rycJo7mdunnQBZ95htceh1Dn8V27dtnChQv95k43bgpbugawJU+AgJW8Y1LkGmkytV4uXXiFdxWgiafXW9BS4/4asqpVq1aRv4cP/isgOz29eT1XDQcqFESPqWMWRiCTNh7mm9NZSqZtXDcely9fphclg2aTaRv/r3N8BlXgV2ISIGDFBEuxCCCAAAIIIJBeAQJWeo89e44AAggggAACMQkQsGKCpVgEEEAAAQQQSK8AASu9x549RwABBBBAAIGYBAhYMcFSLAIIIIAAAgikV4CAld5jz54jgAACCCCAQEwCBKyYYCkWAQQQQAABBNIrQMBK77FnzxFAAAEEEEAgJgECVkywFIsAAggggAAC6RUgYKX32LPnCCCAAAIIIBCTAAErJliKRQABBBBAAIH0ChCw0nvs2XMEEEAAAQQQiEmAgBUTLMUigAACCCCAQHoFCFjpPfbsOQIIIIAAAgjEJEDAigmWYhFAAAEEEEAgvQIErPQee/YcAQQQQAABBGISIGDFBEuxCCCAAAIIIJBeAQJWeo89e44AAggggAACMQkQsGKCpVgEEEAAAQQQSK8AASu9x549RwABBBBAAIGYBAhYMcFSLAIIIIAAAgikV4CAld5jz54jgAACCCCAQEwCBKyYYCkWAQQQQAABBNIrQMBK77FnzxFAAAEEEEAgJgECVkywFIsAAkUTOHHihA0ePNjWrVtnFSpUsKtXr9qAAQPs+eeft1atWtmMGTPsm2++sfbt29uQIUPs/vvv94K/++47W7Roke3evdsee+yxvJ8dPXrUxo8fbw8++KAtXLjQli1bZs2aNStaZfgUAgggEEiAgBUIkmIQQCAzgUuXLlnz5s1tx44dVq9ePVuzZo2Hqk2bNtmLL75otWrVsmHDhtnGjRvt22+/teXLl9uZM2esS5cuNnLkSHv44Yc9SJ08edKWLFlia9eu9f/eqVMn69Wrl/Xs2dPKly+fWeX4LQQQQCBDAQJWhnD8GgIIhBPo0KGDvffee9amTRvr2rWrjRgxwmrUqGEvvPCCbdu2zW699Va7cOGCdezY0XustB0/ftyaNm3qYevLL7+0Xbt22erVq2369Om2fft2++qrrwhW4Q4RJSGAQDEFCFjFBOPjCCAQXkBBqnv37l7wggULbP369TZ79mybM2fONV+mwPXrr7/aqFGj7PTp09a4cWM7deqUB7OZM2dav379vHdr4MCB4StKiQgggEARBQhYRYTiYwggEJ/AtGnTvLdJPVCTJk3ygPTqq69alSpVbOLEif7Fhw8ftj/++MPatm3r86s0R0vBrGLFivboo496oOrfv781adLEy2nRokV8FaZkBBBA4P8IELBoIgggUOoCmlc1duxYD0WrVq2ycuXK2aeffmrz58+3lStX+vBg3759bcyYMda7d2+f8K4g1q1bN/vkk09sypQpPuG9QYMG1rlzZzt48KBVqlSp1PeLCiCAQHoFCFjpPfbsOQKJEdC8Kg3tLV261MOTtsuXL/vk9q1bt/pEd/VODR8+3MOXPjdhwgSrVq2a9ejRwzZs2GBz5841TZjXsKImurMhgAACpSlAwCpNfb4bAQRcQE8NqtdJvVGFNw0LKkgVfhJQYeqff/6xqlWroogAAggkToCAlbhDQoUQSI/AoUOHbNasWb6mlXqhtEwDGwIIIFAWBAhYZeEosg8I5KiA1q76+uuvfZJ6w4YNc3QvqDYCCCBwrQABi1aBAAIIIIAAAggEFiBgBQalOAQQQAABBBBAgIBFG0AAAQQQQAABBAILELACg1IcAggggAACCCBAwKINIIAAAggggAACgQUIWIFBKQ4BBBBAAAEEECBg0QYQQAABBBBAAIHAAgSswKAUhwACCCCAAAIIELBoAwgggAACCCCAQGABAlZgUIpDAAEEEEAAAQQIWLQBBBBAAAEEEEAgsAABKzAoxSGAAAIIIIAAAgQs2gACCCCAAAIIIBBYgIAVGJTiEEAAAQQQQAABAhZtAAEEEEAAAQQQCCxAwAoMSnEIIIAAAggggAABizaAAAIIIIAAAggEFiBgBQalOAQQQAABBBBAgIBFG0AAAQQQQAABBAIL/A9zE0EsE5BcKgAAAABJRU5ErkJggg==" /><br />
The difference between .51 and .48097 or 'gap' between KY and its synthetic control represents the counterfactual impact of the program in KY. Placebo tests can be ran and visualized using each state from the donor pool as a 'placebo treatment' and constructing synthetic controls using the remaining states. This can be used to produce a distribution of gaps that characterize the uncertainty in our estimate of the treatment effects based on the KY vs KY* synthetic control comparison.<br />
<br />
The code excerpt below is an example of how we would designate CA to be our placebo treatment and use the remaining states to create its synthetic control. This could be iterated across all of the remaining controls.<br />
<b><br /></b>
<br />
<table class="highlight tab-size js-file-line-container" data-tab-size="8" style="background-color: white; border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #24292e; font-family: -apple-system, system-ui, "Segoe UI", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 14px; tab-size: 8;"><tbody style="box-sizing: border-box;">
<tr style="box-sizing: border-box;"><td class="blob-code blob-code-inner js-file-line" id="file-ex-toy-synthetic-controls-r-LC34" style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; font-size: 12px; line-height: 20px; overflow-wrap: normal; overflow: visible; padding: 0px 10px; position: relative; vertical-align: top; white-space: pre;"><span class="pl-v" style="box-sizing: border-box; color: #e36209;">treatment.identifier</span> <span class="pl-k" style="box-sizing: border-box; color: #d73a49;">=</span> <span style="color: #005cc5;">3</span>, <span class="pl-c" style="box-sizing: border-box; color: #6a737d;"><span class="pl-c" style="box-sizing: border-box;">#</span> indicates our 'placebo' treatment group</span></td></tr>
<tr style="box-sizing: border-box;"><td class="blob-num js-line-number" data-line-number="35" id="file-ex-toy-synthetic-controls-r-L35" style="box-sizing: border-box; cursor: pointer; font-size: 12px; line-height: 20px; min-width: 50px; padding: 0px 10px; text-align: right; vertical-align: top; white-space: nowrap; width: 50px;"><span style="color: rgba(27 , 31 , 35 , 0.301960784313725);">controls.identifier</span><span style="color: #24292e; white-space: pre;"> </span><span class="pl-k" style="box-sizing: border-box; color: #d73a49; white-space: pre;">=</span><span style="color: #24292e; white-space: pre;"> c(1,</span><span class="pl-c1" style="box-sizing: border-box; color: #005cc5; white-space: pre;">2</span><span style="color: #24292e; white-space: pre;">,</span><span style="color: #005cc5; white-space: pre;">4</span><span style="color: #24292e; white-space: pre;">), </span><span class="pl-c" style="box-sizing: border-box; color: #6a737d; white-space: pre;"><span class="pl-c" style="box-sizing: border-box;">#</span> these states are part of our control pool which will be weighted
</span></td><td class="blob-code blob-code-inner js-file-line" id="file-ex-toy-synthetic-controls-r-LC35" style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; font-size: 12px; line-height: 20px; overflow-wrap: normal; overflow: visible; padding: 0px 10px; position: relative; vertical-align: top; white-space: pre;"><br /></td></tr>
</tbody></table>
<b><br /></b>
<b><br /></b>
<b>R Code: </b><a href="https://gist.github.com/BioSciEconomist/6eb824527c03e12372667fb8861299bd">https://gist.github.com/BioSciEconomist/6eb824527c03e12372667fb8861299bd</a><br />
<b><br /></b>
<b>References:</b><br />
<br />
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105: 493–505. doi:10.1198/jasa.2009.ap08746.<br />
<br />
Alberto Abadie, Alexis Diamond, Jens Hainmueller<br />
Synth: An R Package for Synthetic Control Methods in Comparative Case Studies<br />
Journal of Statistical Software. 2011<br />
<br />
Bouttell J, Craig P, Lewsey J, et al Synthetic control methodology as a tool for evaluating population-level health interventions J Epidemiol Community Health 2018;72:673-678.<br />
<br />
More public policy analysis: synthetic control in under an hour<br />
https://thesamuelsoncondition.com/2016/04/29/more-public-policy-analysis-synthetic-control-in-under-an-hour/comment-page-1/<br />
<br />
<br />
<b>Data:</b><br />
<br />
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style><br />
<table border="1" cellpadding="0" cellspacing="0" dir="ltr" style="border-collapse: collapse; border: none; font-family: arial,sans,sans-serif; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="33"></col><col width="49"></col><col width="48"></col><col width="48"></col><col width="62"></col><col width="45"></col><col width="38"></col></colgroup><tbody>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"ID"}" style="border-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">ID</td><td data-sheets-value="{"1":2,"2":"year"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">year</td><td data-sheets-value="{"1":2,"2":"state"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">state</td><td data-sheets-value="{"1":2,"2":"Y"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">Y</td><td data-sheets-value="{"1":2,"2":"X1"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">X1</td><td data-sheets-value="{"1":2,"2":"X2"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">X2</td><td data-sheets-value="{"1":2,"2":"X3"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); border-top-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">X3</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1990}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1990</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.45}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.45</td><td data-sheets-value="{"1":3,"3":50000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">50000</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1991}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1991</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.45}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.45</td><td data-sheets-value="{"1":3,"3":51000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">51000</td><td data-sheets-value="{"1":3,"3":26}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">26</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1992}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1992</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.46}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.46</td><td data-sheets-value="{"1":3,"3":52000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">52000</td><td data-sheets-value="{"1":3,"3":27}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">27</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1993}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1993</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.48}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.48</td><td data-sheets-value="{"1":3,"3":52000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">52000</td><td data-sheets-value="{"1":3,"3":28}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">28</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1994}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1994</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.48}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.48</td><td data-sheets-value="{"1":3,"3":52000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">52000</td><td data-sheets-value="{"1":3,"3":28}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">28</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1995}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1995</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.48}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.48</td><td data-sheets-value="{"1":3,"3":53000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">53000</td><td data-sheets-value="{"1":3,"3":27}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">27</td><td data-sheets-value="{"1":3,"3":15}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">15</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1996}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1996</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.49}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.49</td><td data-sheets-value="{"1":3,"3":53000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">53000</td><td data-sheets-value="{"1":3,"3":24}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">24</td><td data-sheets-value="{"1":3,"3":15}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">15</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1997}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1997</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.5}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.5</td><td data-sheets-value="{"1":3,"3":54000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">54000</td><td data-sheets-value="{"1":3,"3":24}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">24</td><td data-sheets-value="{"1":3,"3":15}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">15</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":1}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1</td><td data-sheets-value="{"1":3,"3":1998}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1998</td><td data-sheets-value="{"1":2,"2":"KY"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">KY</td><td data-sheets-value="{"1":3,"3":0.51}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.51</td><td data-sheets-value="{"1":3,"3":55000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">55000</td><td data-sheets-value="{"1":3,"3":23}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">23</td><td data-sheets-value="{"1":3,"3":15}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">15</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1990}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1990</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.45}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.45</td><td data-sheets-value="{"1":3,"3":52000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">52000</td><td data-sheets-value="{"1":3,"3":23}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">23</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1991}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1991</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.45}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.45</td><td data-sheets-value="{"1":3,"3":51000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">51000</td><td data-sheets-value="{"1":3,"3":23}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">23</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1992}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1992</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.44}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.44</td><td data-sheets-value="{"1":3,"3":53000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">53000</td><td data-sheets-value="{"1":3,"3":24}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">24</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1993}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1993</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.45}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.45</td><td data-sheets-value="{"1":3,"3":51000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">51000</td><td data-sheets-value="{"1":3,"3":26}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">26</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1994}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1994</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.44}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.44</td><td data-sheets-value="{"1":3,"3":52000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">52000</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1995}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1995</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.43}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.43</td><td data-sheets-value="{"1":3,"3":54000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">54000</td><td data-sheets-value="{"1":3,"3":26}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">26</td><td data-sheets-value="{"1":3,"3":14}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">14</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1996}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1996</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.42}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.42</td><td data-sheets-value="{"1":3,"3":54000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">54000</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td><td data-sheets-value="{"1":3,"3":14}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">14</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1997}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1997</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.4}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.4</td><td data-sheets-value="{"1":3,"3":55000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">55000</td><td data-sheets-value="{"1":3,"3":26}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">26</td><td data-sheets-value="{"1":3,"3":14}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">14</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":2}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2</td><td data-sheets-value="{"1":3,"3":1998}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1998</td><td data-sheets-value="{"1":2,"2":"TN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">TN</td><td data-sheets-value="{"1":3,"3":0.41}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.41</td><td data-sheets-value="{"1":3,"3":56000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">56000</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td><td data-sheets-value="{"1":3,"3":14}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">14</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1990}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1990</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.89}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.89</td><td data-sheets-value="{"1":3,"3":102000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">102000</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td><td data-sheets-value="{"1":3,"3":20}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">20</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1991}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1991</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.9}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.9</td><td data-sheets-value="{"1":3,"3":102500}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">102500</td><td data-sheets-value="{"1":3,"3":11}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">11</td><td data-sheets-value="{"1":3,"3":20}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">20</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1992}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1992</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.9}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.9</td><td data-sheets-value="{"1":3,"3":103000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">103000</td><td data-sheets-value="{"1":3,"3":13}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">13</td><td data-sheets-value="{"1":3,"3":20}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">20</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1993}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1993</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.92}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.92</td><td data-sheets-value="{"1":3,"3":103500}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">103500</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td><td data-sheets-value="{"1":3,"3":20}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">20</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1994}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1994</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.93}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.93</td><td data-sheets-value="{"1":3,"3":104000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">104000</td><td data-sheets-value="{"1":3,"3":11}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">11</td><td data-sheets-value="{"1":3,"3":20}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">20</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1995}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1995</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.93}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.93</td><td data-sheets-value="{"1":3,"3":104000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">104000</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1996}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1996</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.94}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.94</td><td data-sheets-value="{"1":3,"3":104500}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">104500</td><td data-sheets-value="{"1":3,"3":14}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">14</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1997}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1997</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.94}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.94</td><td data-sheets-value="{"1":3,"3":105000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">105000</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":3}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3</td><td data-sheets-value="{"1":3,"3":1998}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1998</td><td data-sheets-value="{"1":2,"2":"CA"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">CA</td><td data-sheets-value="{"1":3,"3":0.95}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.95</td><td data-sheets-value="{"1":3,"3":105000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">105000</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1990}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1990</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.43}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.43</td><td data-sheets-value="{"1":3,"3":52000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">52000</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1991}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1991</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.44}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.44</td><td data-sheets-value="{"1":3,"3":52000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">52000</td><td data-sheets-value="{"1":3,"3":26}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">26</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1992}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1992</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.42}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.42</td><td data-sheets-value="{"1":3,"3":53000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">53000</td><td data-sheets-value="{"1":3,"3":26}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">26</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1993}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1993</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.46}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.46</td><td data-sheets-value="{"1":3,"3":53500}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">53500</td><td data-sheets-value="{"1":3,"3":27}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">27</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1994}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1994</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.45}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.45</td><td data-sheets-value="{"1":3,"3":53500}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">53500</td><td data-sheets-value="{"1":3,"3":28}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">28</td><td data-sheets-value="{"1":3,"3":10}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">10</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1995}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1995</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.46}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.46</td><td data-sheets-value="{"1":3,"3":54000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">54000</td><td data-sheets-value="{"1":3,"3":26}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">26</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1996}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1996</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.47}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.47</td><td data-sheets-value="{"1":3,"3":54000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">54000</td><td data-sheets-value="{"1":3,"3":26}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">26</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1997}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1997</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.45}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.45</td><td data-sheets-value="{"1":3,"3":54500}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">54500</td><td data-sheets-value="{"1":3,"3":25}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">25</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":3,"3":4}" style="border-bottom-color: rgb(0, 0, 0); border-left-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4</td><td data-sheets-value="{"1":3,"3":1998}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1998</td><td data-sheets-value="{"1":2,"2":"IN"}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">IN</td><td data-sheets-value="{"1":3,"3":0.46}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.46</td><td data-sheets-value="{"1":3,"3":55000}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">55000</td><td data-sheets-value="{"1":3,"3":24}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">24</td><td data-sheets-value="{"1":3,"3":12}" style="border-bottom-color: rgb(0, 0, 0); border-right-color: rgb(0, 0, 0); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td></tr>
</tbody></table>
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-90338157797379919572019-04-19T07:54:00.001-04:002020-08-22T16:10:14.997-04:00Intent to Treat, Instrumental Variables and LATE Made Simple(er)Many times in a randomized controlled trial (RCT) issues related to non-compliance arise. Subjects assigned to the treatment fail to comply, while in other cases subjects that were supposed to be in the control group actually receive treatment. One way to deal with non-compliance is through an intent-to-treat framework (ITT)<br />
<br />
Gupdta describes ITT:<br />
<br />
<i>"ITT analysis includes every subject who is randomized according to randomized treatment assignment. It ignores noncompliance, protocol deviations, withdrawal, and anything that happens after randomization. ITT analysis is usually described as “once randomized, always analyzed."</i><br />
<br />
In <a href="http://masteringmetrics.com/">Mastering Metrics</a>, Angrist and Pischke describe intent-to-treat analysis:<br />
<br />
<i>"In randomized trials with imperfect compliance, when treatment assignment differs from treatment delivered, effects of random assignment...are called intention-to-treat (ITT) effects. An ITT analysis captures the causal effect of being assigned to treatment."</i><br />
<i><br /></i>
While treatment assignment is random, non-compliance is not! Therefore if instead of using intent to treat comparisons we compared those actually treated to those untreated (sometimes termed 'as treated' analysis) we would get biased results. When there is non-compliance, there is the likelihood that a relationship exists between potential outcomes and the actual treatment received. While the ITT approach gives an unbiased causal estimate of the treatment effect, it is often a diluted effect because of non-compliance issues and can provide an underestimate of the true effect (Angrist, 2006).<br />
<br />
Angrist and Pishke discuss how instrumental variables can be used in the context of a RCT with non-compliance issues:<br />
<br />
<i> "Instrumental variable methods allow us to capture the causal effect of treatment on the treated in spite of the nonrandom compliance decisions made by participants in experiments....Use of randomly assigned intent to treat as an instrumental variable for treatment delivered eliminates this source of selection bias." </i><br />
<br />
The purpose of this post is to build intuition related to how an instrumental variable (IV) approach differs from ITT, and how it is not biased by selection related to non-compliance issues in the same way that an 'as treated' analysis would be.<br />
<br />
My goal is to demonstrate with a rather simple data set how IVs tease out the biases from non-compliance and give us only the impact of treatment on the compliers also known as the <a href="http://econometricsense.blogspot.com/2017/07/instrumental-variables-and-late.html">local average treatment effect (LATE)</a>.<br />
<br />
A great example of IV and ITT applied to health care can be found in Finkelstein et. al. (2013 & 2014) - See <a href="http://econometricsense.blogspot.com/2014/01/the-oregon-medicaid-experiment-applied.html">The Oregon Medicaid Experiment, Applied Econometics, and Causal Inference</a>.<br />
<br />
For another post walking through the basic mechanics of instrumental variables (IV) estimation using a toy data set see: <a href="http://econometricsense.blogspot.com/2013/06/an-toy-instrumental-variable-application.html">A Toy IV Application.</a><br />
<br />
<b>Key Assumptions</b><br />
<br />
Depending on how you frame it there are about 5 key things (assumptions if we want to call them that) we need to think about when leveraging instrumental variables - in humble language:<br />
<br />
<b>1) SUTVA </b>- you can look that up but basically it means no interactions or spillovers between the treatments and controls - my getting treated does not make a control case have a better or worse outcome as a result<br />
<br />
<b>2) Random Assignment </b>- that is the whole context of the discussion above - the instrument (Z) or treatment assignment must be random<br />
<br />
<b>3) The Exclusion Restriction</b> - Treatment assignment impacts outcome only through the treatment itself. It is the treatment that impacts the outcome. There is nothing about being in the randomly assigned treatment group that would cause your outcome to be higher or lower in and of itself, other than actually receiving the treatment. Treatment assignment is ignorable. This is often represented as: Z -> D -> Y where Z is the instrument or random assignment, D is an indicator for actually receiving the treatment, and Y is the outcome.<br />
<br />
<b>4) Non-zero causal effect of Z on D:</b> Being assigned to the treatment group is highly correlated with actually receiving the treatment i.e. when Z =1 then D is usually 1 as well. (if these were perfectly correlated that would imply perfect compliance)<br />
<br />
<b>5) Monotonicity </b>- We'll just call this an assumption of 'no-defiers.' It means that there are no cases that always do the opposite of what their treatment assignment indicates, i.e. if Z = 1 then D = 0 AND if Z =0 then D is always 1. Stated differently we can't have cases where there are those that always get the treatment when assigned to the control group and never receive treatment when assigned to the treatment group.<br />
<b><br /></b>
<b>Types of Non-Compliance</b><br />
<br />
Given these assumptions, with monotonicity we end up with three different groups of people in our study:<br />
<br />
<b>Never Takers:</b> those that refuse treatment regardless of treatment/control assignment.<br />
<br />
<b>Always Takers: </b>those that get the treatment even if they are assigned to the control group.<br />
<br />
<b>Compliers:</b> those that comply or receive treatment if assigned to a treatment group but do not receive treatment when assigned to control group.<br />
<br />
The compliers are characterized as participants that receive treatment only as a result of random assignment. The estimated treatment effect for these folks is often very desirable and in an IV framework can give us an unbiased causal estimate of the treatment effect. But how does this work?<br />
<br />
<b>Discussion</b><br />
<br />
I have to first recommend a great post over at egap.org titled '<a href="http://egap.org/methods-guides/10-things-you-need-know-about-local-average-treatment-effect">10 Things to Know About Local Average Treatment Effects.'</a> Most of my post is based on those well thought out examples.<br />
<br />
Just to level set, the context of this discussion going forward is a RCT with the outcome measured as Y, and treatment assignment being used as the instrument Z. (this can be extended to apply to other scenarios using other types of instruments). Actual receipt of treatment, or treatment status, is indicated by D with D=1 indicating a receipt of treatment. So an ITT analysis would simply be a comparison of outcomes for folks randomly assigned to treatment (Z = 1) vs those that were controls (Z = 0) regardless of compliance or non-compliance (determined by D). An 'as treated' analysis would be a comparison of everyone that received the treatment (D = 1) vs. those that did not (D=0) regardless of randomization. This is a biased analysis. The IV or local average treatment effect (LATE) estimate is the difference in outcomes for compliers.<br />
<br />
Going back to the original article by Angrist (1996), it discusses IVs, LATEs and the types of noncompliance as they relate to the assumptions we previously discussed. In that article they explain that the treatment status (D) of the always takers and never takers is invariant (uncorrelated) to random assignment Z. No matter what Z is, they are going to do what they are going to do. But, we also know that Z (by definition of compliance and assumption 4) is correlated with actual treatment assignment D for the compliers.<br />
<br />
Lets consider a RCT with one sided non-compliance. In this case the controls are not able to receive the treatment by nature of the design. So there are no 'always takers' in this discussion. Below is a table summarizing a scenario like this with 100 people randomly assigned to treatment (Z = 1) and 100 controls (Z = 0). (This can be extended to include always takers and the post I mentioned before at <a href="http://egap.org/methods-guides/10-things-you-need-know-about-local-average-treatment-effect">egap.org </a>will walk through that scenario)<br />
<br />
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style><br />
<table border="1" cellpadding="0" cellspacing="0" dir="ltr" style="border-collapse: collapse; border: none; font-family: arial,sans,sans-serif; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col></colgroup><tbody>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"Z = 1"}" style="font-weight: bold; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Z = 1</td><td data-sheets-value="{"1":2,"2":"Z = 0"}" style="font-weight: bold; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Z = 0</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"Treatment"}" style="border-bottom: 1px solid #000000; font-weight: bold; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Treatment</td><td data-sheets-value="{"1":2,"2":"Control"}" style="border-bottom: 1px solid #000000; font-weight: bold; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Control</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"Never Taker"}" style="border-left: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Never Taker</td><td data-sheets-value="{"1":2,"2":"Never Taker"}" style="border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Never Taker</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"N = 20"}" style="border-left: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">N = 20</td><td data-sheets-value="{"1":2,"2":"N = 20"}" style="border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">N = 20</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"D = 0"}" style="border-left: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">D = 0</td><td data-sheets-value="{"1":2,"2":"D = 0"}" style="border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">D = 0</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"Y = 5"}" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Y = 5</td><td data-sheets-value="{"1":2,"2":"Y = 5"}" style="border-bottom: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Y = 5</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"Complier"}" style="border-left: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Complier</td><td data-sheets-value="{"1":2,"2":"Complier"}" style="border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Complier</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"N =80"}" style="border-left: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">N =80</td><td data-sheets-value="{"1":2,"2":"N =80"}" style="border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">N =80</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"D = 1"}" style="border-left: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">D = 1</td><td data-sheets-value="{"1":2,"2":"D = 0"}" style="border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">D = 0</td></tr>
<tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"Y = 25"}" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Y = 25</td><td data-sheets-value="{"1":2,"2":"Y = 20"}" style="border-bottom: 1px solid #000000; border-right: 1px solid #000000; overflow: hidden; padding: 2px 3px 2px 3px; vertical-align: bottom;">Y = 20</td></tr>
</tbody></table>
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style><br />
For story telling purposes, let's assume the 'treatment' is a weight loss program. We've got some really unmotivated folks (never takers) in both the treatment and control group that just don't comply with the treatment. Let's say on average they all end up losing 5 pounds (Y = 5) regardless of the group they are in. On the other hand, we have more conscientious folks that if randomly assigned to treatment they will participate. But they are motivated and healthy. Even in absence of treatment their potential outcomes (weight loss) are pretty favorable. They are bound to lose 20 pounds even in absence of treatment.<br />
<br />
As discussed before, we can see how when there is non-compliance, there is the likelihood that a relationship exists between potential outcomes and the actual treatment received.<br />
<br />
If we ignore treatment assignment, and just compare the average weight lost (y) for those that received treatment to all of those that did not we could run the following regression:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">Y = β0 + β1 D + e </span><br />
<br />
with <span style="font-family: "courier new" , "courier" , monospace;">β1 </span>= 10 (see the <a href="https://gist.github.com/BioSciEconomist/a72fae6e01053fdb6d13c9a80d8e39f9">R code </a> that generates this data and these results)<br />
<br />
We could calculate this by hand as: 25 - [(2/3)*20 + (1/3)*5)] = 10<br />
<br />
We know that non-compliance biases this estimate.<br />
<br />
The ITT estimate can be estimated as:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">Y = β0 + β1 Z + e </span><br />
<br />
with <span style="font-family: "courier new" , "courier" , monospace;">β1 </span>= 4<br />
<br />
We can see from the data this is simply the difference in means between the treatment and control group: [.2*5 + .8*25] - [.2*5 + .8*20] = 21-17 = 4<br />
<br />
We know from the discussion above and can see from the data that this is greatly diluted by noncompliance. But because of randomization this is an unbiased estimate.<br />
<br />
Finally, the IV or local average treatment effect (LATE) estimate is the difference in outcomes for compliers.<br />
<br />
Because our example above is contrived, the outcomes for the compliers is made explicit in the table above. If you know exactly who the compliers are the math would be straight forward:<br />
<br />
LATE = 25 - 20 = 5<br />
<br />
You can also get LATEs by dividing the ITT effect by the share of compliers:<br />
<br />
4/.8 = 5<br />
<br />
In a <a href="http://econometricsense.blogspot.com/2017/07/instrumental-variables-and-late.html">previous post,</a> I've described how an IV estimate teases out only that variation in our treatment D that is unrelated to selection bias and relates it to Y giving us an estimate for the treatment effect of D that is less biased.<br />
<br />
We can view this through the lens of a 2SLS modeling strategy:<br />
<br />
Stage 1: Regress D on Z to get D*<br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">D* = β0 + β1 Z + e</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">β1 </span>only picks up the variation in Z that is related to D (i.e. <i>quasi-experimental variation</i>) and leaves all of the variation in D related to non-compliance and selection in the residual term. You can think of this as working like a filtering process.<br />
<br />
Stage 2: Regress Y on D*<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">Y = β0 +βIV D* + e </span><br />
<br />
The second stage relates changes in Z (<i>quasi-experimental variation</i>) to changes in our target Y.<br />
<br />
We can see (from the R code below) that our estimate <span style="font-family: "courier new" , "courier" , monospace;">βIV</span> = 5.<br />
<br />
We can also get the same result (and correct standard errors) by using the <span style="font-family: "courier new" , "courier" , monospace;">ivreg</span> function from the AER package in R:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">summary(ivreg(y ~ D | Z,data =df))</span><br />
<b><br /></b>
<b>R </b><b>Code: </b><a href="https://gist.github.com/BioSciEconomist/a72fae6e01053fdb6d13c9a80d8e39f9">https://gist.github.com/BioSciEconomist/a72fae6e01053fdb6d13c9a80d8e39f9</a><br />
<b><br /></b>
<b>References:</b><br />
<br />
Angrist, Joshua D., et al. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association, vol. 91, no. 434, 1996, pp. 444–455. JSTOR, www.jstor.org/stable/2291629.<br />
<br />
Angrist, J.D. J Exp Criminol (2006) 2: 23. https://doi.org/10.1007/s11292-005-5126-x<br />
<br />
"The Oregon Experiment--Effects of Medicaid on Clinical Outcomes," by Katherine Baicker, et al. New England Journal of Medicine, 2013; 368:1713-1722. http://www.nejm.org/doi/full/10.1056/NEJMsa1212321<br />
<br />
Medicaid Increases Emergency-Department Use: Evidence from Oregon's Health Insurance Experiment. Sarah L. Taubman,Heidi L. Allen, Bill J. Wright, Katherine Baicker, and Amy N. Finkelstein. Science 1246183Published online 2 January 2014 [DOI:10.1126/science.1246183]<br />
<br />
Gupta, S. K. (2011). Intention-to-treat concept: A review. Perspectives in Clinical Research, 2(3), 109–112. http://doi.org/10.4103/2229-3485.83221<br />
<br />
<br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-61118201722255569422019-03-30T16:30:00.002-04:002020-08-22T16:11:03.563-04:00Abandoning Statistical Significance - Or - Two Ways to Sell Snake OilThere was recently a very good article in <a href="https://www.nature.com/articles/d41586-019-00857-9">Nature</a> pushing back against dichotomizing thresholds for p-values (i.e. p < .05). This follows the <a href="https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108#.XJ94QOJKjEb">ASAs statement</a> on the interpretation of p-values.<br />
<br />
I've blogged before about previous efforts to pushback against p-values and proposals to focus on confidence intervals (which often just reframe the problem in other ways that get misinterpreted see <a href="https://econometricsense.blogspot.com/2017/08/confidence-intervals-fad-or-fashion_7.html">here</a>, <a href="https://econometricsense.blogspot.com/2017/03/interpreting-confidence-intervals.html">here</a>, <a href="https://econometricsense.blogspot.com/2015/01/overconfident-confidence-intervals.html">here</a>, and <a href="http://econometricsense.blogspot.com/2018/12/thinking-about-confidence-intervals.html">here</a>). And absolutely there are problems with <a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">p-hacking, failures to account for multiple comparisons and multiple testing, and gardens of forking paths. </a><br />
<br />
The authors in the Nature article state:<br />
<br />
<i>"We are not calling for a ban on P values. Nor are we saying they cannot be used as a decision criterion in certain specialized applications (such as determining whether a manufacturing process meets some quality-control standard). And we are also not advocating for an anything-goes situation, in which weak evidence suddenly becomes credible. Rather, and in line with many others over the decades, we are calling for a stop to the use of P values in the conventional, dichotomous way — to decide whether a result refutes or supports a scientific hypothesis"</i><br />
<br />
What worries me is that some readers won't read this with the careful thought and attention it deserves.<br />
<br />
Andrew Gelman, one of those signing the letter, seems to have had some similar concerns noted in his post <a href="https://statmodeling.stat.columbia.edu/2019/03/20/retire-statistical-significance-the-discussion/">“Retire Statistical Significance”: The discussion.</a> In his post he shares a number of statements in the article and how they could be misleading. We have to remember that statistics and inference can be hard. It's hard for Phds that have spent their entire lives doing this stuff. It's hard for practitioners that have made their careers out of it. So it is important to consider the ways that these statements could be interpreted by others that are not as skilled in inference and experimental design as the authors and signatories.<br />
<br />
Gelman states:<br />
<br />
<i>"the Comment is written with an undercurrent belief that there are zillions of true, important effects out there that we erroneously dismiss. The main problem is quite the opposite: there are zillions of nonsense claims of associations and effects that once they are published, they are very difficult to get rid of. The proposed approach will make people who have tried to cheat with massaging statistics very happy, since now they would not have to worry at all about statistics. Any results can be spun to fit their narrative. Getting entirely rid of statistical significance and preset, carefully considered thresholds has the potential of making nonsense irrefutable and invincible."</i><br />
<br />
In addition Gelman says:<br />
<br />
<i>"statistical analysis at least has some objectivity and if the rules are carefully set before the data are collected and the analysis is run, then statistical guidance based on some thresholds (p-values, Bayes factors, FDR, or other) can be useful. Otherwise statistical inference is becoming also entirely post hoc and subjective"</i><br />
<br />
<br />
<b>An Illustration: Two Ways to Sell Snake Oil</b><br />
<br />
So let me propose a fable. Suppose there is a salesman with an elixir claiming it is a miracle breakthrough for weight loss. Suppose they have lots and lots of data, large sample sizes, and randomized controlled trials supporting its effectiveness. In fact, in all of their studies they find that on average, consumers using the elixir have a loss of weight with highly statistically significant results (p < .001). Ignoring effect sizes (i.e how much weight do people actually lose on average?) the salesman touts the <i>precision</i> of the results and sells lots and lots of elixir based on the significance of the findings.<br />
<br />
If the salesman were willing to confess that the estimates of the effects of taking the elixir were very <i>precise</i> but we are precisely measuring an average loss of about 1.5 pounds per year compared to controls - it would destroy his sales pitch!<br />
<br />
So now the salesman reads our favorite article in Nature. He conducts a number of additional trials. This time he's going to focus only on the effect sizes from the studies and maybe this time goes with smaller sample sizes. After all, R&D is expensive! Looking only at effect sizes, he knows that a directional finding of 1.5 pounds per year isn't going to sell. So how large does the effect need to be to take his snake oil to market with <i>data</i> to support it? Is 2 pounds convincing? Or 3,4,5-10? Suppose his data show an average annual loss of weight near 10 pounds greater for those using the elixir vs. a control group. He goes to market with this claim. As he is making a pitch to a crowd of potential buyers, one savvy consumer gives him a critical review asking if his results were statistically significant. The salesman having read our favorite Nature article replies that mainstream science these days is more concerned with effect sizes than dichotomous notions of statistical significance. To the crowd this sounds like a sophisticated and informed answer so that day he sells his entire stock.<br />
<br />
Eventually someone uncovers the actual research related to the elixer. They find that yes, on average most of those studies found an effect of about 10 pounds of annual loss of weight. But the p-values associated with these estimates in these studies ranged from .25-.40. What does this mean?<br />
<br />
P-values tell us the probability under a specified statistical model that a statistical summary of the data (e.g., the sample
mean difference between two compared groups) would be equal
to or more extreme than its observed value.<br />
<br />
Simplifying we could say, if the elixir really is snake oil, a p-value equal to .25 tells us that there is a 25% probability that we would observe an average loss of weight equal to or greater than 10 pounds. People in the study would be likely to lose 10 pounds or more even if they did not take the elixir.<br />
<br />
A p-value of .25 doesn't necessarily mean that the elixir is ineffective. That is sort of the point of the article in Nature. It just means that the evidence for rejecting the null hypothesis of zero effect is weak.<br />
<br />
What if instead of selling elixir the salesman was taking bets with a two headed coin. How would we catch him in the act? What if he flipped the coin two times and got two heads in a row? (and we just lost $100 at $50/flip) If we only considered the observed outcomes, and knew nothing about the distribution of coin flips (and completely ignored intuition) we might think this is evidence of cheating. After all, two heads in a row would be consistent with a two headed coin. But I wouldn't be dialing my lawyer yet.<br />
<br />
If we consider the simple probability distribution associated with tossing a two sided coin, we would know that there is a 50% chance of flipping a normal coin and getting heads and a 25% chance of flipping a normal coin twice and getting two heads in a row. This is roughly analogous to a p-value equal to .25. In other words, there is a good chance if our con-artist were using a fair coin he could in fact flip two heads in a row. This does not mean he is innocent, it just means that when we consider the distribution, variation, and probabilities associated with flipping coins the evidence just isn't that convincing. We might say that our observed data is compatible with the null hypotheses that the coin is fair. We could say the same thing about the evidence from our fable about weight loss or any study with a p-value equal to .25.<br />
<br />
What if our snake oil elixir salesman flipped his coin 4 times and got 4 heads in a row? The probability of 4 heads in a row is 6.25% if he has a fair coin. What about 5? Under the null hypothesis of a 'fair' coin the probability of observing an event as extreme as 5 heads in a row is 3.125%. Do we think our salesman could be that lucky and get 4 or 5 heads in a row? Many people would have their doubts. When we get past whatever threshold is required to start having doubts about the null hypothesis then intuitively we begin to feel comfortable rejecting the null hypothesis. As the article in Nature argues, this cutoff should not necessarily be 5% or p < .05. However in this example the probabilities are analogous to having p-values of .0625 and .03125 which are in the vicinity of our traditional threshold of .05. I don't think reading the article in Nature should change your mind about this.<br />
<br />
<b>Conclusion</b><br />
<br />
We see with our fable, the pendulum could swing too far in either direction and lead to abusive behavior and questionable conclusions. Economist Noah Smith <a href="http://noahpinionblog.blogspot.com/2015/08/the-backlash-to-backlash-against-p.html">discussed the pushback against p-values</a> a few years ago. He stated rightly that <i>'if people are doing science right, these problems won't matter in the long run.' </i>Focusing on effect size only and ignoring distribution, variation, and uncertainty risks backsliding from the science that revolutionized the 20th century into the world of anecdotal evidence. Clearly the authors and signatories of the Nature article are not advocating this, as they stated in the excerpts I shared above. It is how this article gets interpreted and cited that matters most. As Gelman states:<br />
<br />
<i>"some rule is needed for the game to be fair. Otherwise we will get into more chaos than we have now, where subjective interpretations already abound. E.g. any company will be able to claim that any results of any trial on its product to support its application for licensing" </i><br />
<br />
<br />
<br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-75408177136047588662019-02-24T08:54:00.005-05:002019-02-24T08:54:56.805-05:00The Multiplicity of Data ScienceThere was a really good article on LinkedIn some time ago regarding how Airbnb classifieds its data science roles: https://www.linkedin.com/pulse/one-data-science-job-doesnt-fit-all-elena-grewal/<br />
<br />
<i>"The Analytics track is ideal for those who are skilled at asking a great question, exploring cuts of the data in a revealing way, automating analysis through dashboards and visualizations, and driving changes in the business as a result of recommendations. The Algorithms track would be the home for those with expertise in machine learning, passionate about creating business value by infusing data in our product and processes. And the Inference track would be perfect for our statisticians, economists, and social scientists using statistics to improve our decision making and measure the impact of our work."</i><br />
<br />
I think this helps tremendously to clarify thinking in this space.Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0