tag:blogger.com,1999:blog-24744983008595938072018-06-06T03:54:49.060-04:00Econometric SenseAn attempt to make sense of econometrics, biostatistics, machine learning, experimental design, bioinformatics, ....Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.comBlogger284125tag:blogger.com,1999:blog-2474498300859593807.post-16828434048970141642018-05-24T17:39:00.000-04:002018-06-05T19:48:55.921-04:00Statistical Inference vs. Causal Inference vs. Machine Learning: A motivating exampleIn his well known paper, Leo Breiman discusses the<a href="http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html"> 'cultural' differences </a>between algorithmic (machine learning) approaches and traditional methods related to inferential statistics. <a href="http://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html">Recently,</a> I discussed how important understanding these kinds of distinctions are when it comes to understanding how current automated machine learning tools can be leveraged in the data science space.<br /><br />In his paper Leo Breiman states:<br /><br /><i>"Approaching problems by looking for a data model imposes an apriori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems."</i><br /><br />On the other hand, <a href="https://www.youtube.com/watch?v=Yx6qXM_rfKQ&feature=share">Susan Athey's work</a> highlights the fact that no one has developed the asymptotic theory necessary to adequately address causal questions using methods from machine learning (i.e. how does a given machine learning algorithm fit into the context of the <a href="http://econometricsense.blogspot.com/2013/05/selection-bias-and-rubin-causal-model.html">Rubin Causal Model/potential outcomes framework</a>?)<br /><br />Dr. Athey is working to bridge some of this gap, but it's very complicated. I think there is a lot that can also be done, just understanding and communicating about the differences between inferential and causal questions vs. machine learning/predictive modeling questions. When should each be used for a given business problem? What methods does this entail?<br /><br />In an <a href="https://soundcloud.com/datamadetomatter/whole-person-healthcare-is-here">MIT Data Made to Matter podcast,</a> economist Joseph Doyle discusses his paper investigating the relationship between more aggressive (and expensive) treatments by hospitals and improved outcomes for medicare patients. Using this as an example, I hope to broadly illustrate some of these differences looking at this problem through all three lenses.<br /><br /><b>Statistical Inference</b><br /><br />Suppose we just want to know if there is a significant relationship between aggressive treatments 'A' and health outcomes (mortality) 'M.' We might estimate a regression equation (similar to one of the models in the paper) such as:<br /><br />M = b0 + b1*A + b2*X + e where X is a vector of relevant controls.<br /><br />We would be very careful about the nature of our data, correct functional form, and getting our standard errors correct to make valid inferences about our estimate 'b1' of the relationship between aggressive treatments A and mortality M. A lot of this is traditionally taught in econometrics, biostatistics, and epidemiology (things like heteroskedasticity, multicollinearity, distributional assumptions related to the error terms etc.)<br /><br /><b>Causal Inference</b><br /><br />Suppose we wanted to know if the estimate b1 in the equation above is causal. In Doyle's paper they discuss some of the challenges:<br /><br /><i>"A major issue that arises when comparing hospitals is that they may treat different types of patients. For example, greater treatment levels may be chosen for populations in worse health. At the individual level, higher spending is strongly associated with higher mortality rates, even after risk adjustment, which is consistent with more care provided to patients in (unobservably) worse health. At the hospital level, long-term investments in capital and labor may reflect the underlying health of the population as well. Differences in unobservable characteristics may therefore bias results toward finding no effect of greater spending."</i><br /><br />One of the points he is making is that even if we control for everything we typically measure in these studies (captured by X above) there are unobservable characteristics related to patients that weaken our estimate of b1. Recall that methods like regression and matching (<a href="http://econometricsense.blogspot.com/2017/07/regression-as-variance-based-weighted.html">which are two flavors of identification strategies based on selection on observables</a>) achieve identification by assuming that conditional on observed characteristics (X), selection bias disappears. We want to make conditional on X comparisons of Y (or M in the model above) that mimic as much as possible the experimental benchmark of random assignment (see more on matching estimators <a href="http://econometricsense.blogspot.com/2011/07/matching-estimators.html">here.</a>)<br /><br />However, if there are important characteristics related to selection that we don't observe and can't include in X, then in order to make valid causal statements about our results, we need a method that identifies treatment effects within a selection on 'un'-observables framework. (examples include <a href="http://econometricsense.blogspot.com/2012/12/difference-in-difference-estimators.html">difference-in-differences</a>, <a href="http://econometricsense.blogspot.com/2014/04/intuition-for-fixed-effects.html">fixed effects</a>, and <a href="http://econometricsense.blogspot.com/2017/07/instrumental-variables-and-late.html">instrumental variables</a>).<br /><br />In Doyle's paper, they used ambulance service as an instrument for hospital choice to make causal statements about A.<br /><br /><b>Machine Learning/Predictive Modeling</b><br /><br />Suppose we just want to predict mortality by hospital to support some policy or operational objective where the primary need is accurate predictions. A number of algorithmic methods might be exploited including logistic regression, decision trees, random forests, neural networks etc. Based on the mixed findings in the literature, a machine learning algorithm may not exploit 'A' at all even though Doyle finds a significant causal effect based on his instrumental variables estimator. The point is, in many cases a black box algorithm that includes or excludes treatment intensity as a predictor doesn't really care about the significance of this relationship or its causal mechanism, as long as at the end of the day the algorithm predicts well out of sample and maintains reliability and usefulness in application over time.<br /><br /><b>Discussion</b><br /><br />If we wanted to know if the relationship between intensity of care 'A' was statistically significant or causal, we would not rely on machine learning methods. At least nothing available on the shelf today pending further work by researchers like Susan Athey. We would develop the appropriate causal or inferential model designed to answer the particular question at hand. In fact, as Susan Athey points out in a past <a href="https://www.quora.com/How-will-machine-learning-impact-economics">Quora commentary,</a> models used for causal inference could possibly give worse predictions:<br /><br /><i>"Techniques like instrumental variables seek to use only some of the information that is in the data – the “clean” or “exogenous” or “experiment-like” variation in price—sacrificing predictive accuracy in the current environment to learn about a more fundamental relationship that will help make decisions...This type of model has not received almost any attention in ML."</i><br /><br />The point is, for the data scientist caught in the middle of so much disruption related to tools like automated machine learning, as well as technologies producing and leveraging large amounts of data, it is important to focus on business understanding and map the appropriate method to address what is trying to be achieved. The ability to understand the differences in tools and methodologies related to statistical inference, causal inference, and machine learning and explaining those differences to stakeholders will be important to prevent 'straight jacket' thinking about solutions to complex problems.<br /><br /><b>References:</b><br /><br />Doyle, Joseph et al. “Measuring Returns to Hospital Care: Evidence from Ambulance Referral Patterns.” The journal of political economy 123.1 (2015): 170–214. PMC. Web. 11 July 2017.<br />https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351552/<br /><br />Matt Bogard. "A Guide to Quasi-Experimental Designs" (2013)<br />Available at: http://works.bepress.com/matt_bogard/24/<br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-19597586567585989092018-04-17T06:53:00.000-04:002018-04-17T10:16:26.672-04:00He who must not be named....or can we say 'causal'?Recall in the Harry Potter series, the wizard community refused to say the name of 'Voldemort' and it got to the point where they almost stopped teaching and practicing magic (at least officially as mandated by the Ministry of Magic). In the research community, by refusing to use the term 'causal' when and where appropriate, are we discouraging researchers from asking interesting questions and putting forth the effort required to implement the kind of rigorous causal inferential methods necessary to push forward the frontiers of science? Could we somehow be putting a damper on teaching and practicing economagic...I mean econometrics...you know the <a href="http://www.mostlyharmlesseconometrics.com/">mostly harmless</a> kind? Will the <a href="http://econometricsense.blogspot.com/2017/07/the-credibility-revolution-in.html">credibility revolution </a>be lost?<br /><br />In a recent May 2018 article in the American Journal of Public Health (by Miguel Hernan of the Departments of Epidemiology and Biostatistics, Harvard School of Public Health) there is an important discussion about the somewhat tiring mantra <i>'correlation is not causation'</i> and disservice to scientific advancement that it can lead to in absence of critical thinking about research objectives and designs. Some people might think this is ironic, since often the phrase is invoked as a means to point out fallacious conclusions that have been uncritically based on mere correlations found in the data. However, the pendulum can swing too far in the other direction causing as much harm.<br /><br /><b style="font-style: italic;">I highly recommend reading this article! </b>It is available ungated and will be one of those you hold onto for a while. See the reference section below.<br /><br />Key to the discussion are important distinctions between questions of association, prediction, and causality. Below are some spoilers:<br /><br /><b>While it is wrong to assume causality based on association or correlation alone, refusing to recognize a causal approach in the analysis because of growing cultural 'norms' is also not good either....and should stop:</b><br /><br /><i>"The resulting ambiguity impedes a frank discussion about methodology because the methods used to estimate causal effects are not the same as those used to estimate associations...We need to stop treating “causal” as a dirty word that respectable investigators do not say in public or put in print. It is true that observational studies cannot definitely prove causation, but this statement misses the point"</i><br /><br /><b>All the glitters isn't gold, as the author notes on randomized controlled trials :</b><br /><br /><i>"Interestingly, the same is true of randomized trials. All we can estimate from randomized trials data are associations; we just feel more confident giving a causal interpretation to the association between treatment assignment and outcome because of the expected lack of confounding that physical randomization entails. However, the association measures from randomized trials cannot be given a free pass. Although randomization eliminates systematic confounding, even a perfect randomized trial only provides probabilistic bounds on “random confounding”—as reflected in the confidence interval of the association measure—and many randomized trials are far from perfect."</i><br /><br /><b>There are important distinctions between analysis and methodological approaches when asking questions related to prediction and association vs causality. Saying a bit more, this is not just about model interpretation. We are familiar with discussions about challenges related to interpreting predictive models derived from complicated black box algorithms, but causality hinges on much more than just the ability to interpret the impact of features on an outcome. Also note that while we are seeing applications of AI and automated feature engineering and algorithm selection, models optimized to predict well may not explain well at all. In fact, a causal model may perform worse in out of sample predictions of the 'target' while giving the most rigorous estimate of causal effects:</b><br /><br /><i>"In associational or predictive models, we do not try to endow the parameter estimates with a causal interpretation because we are not trying to adjust for confounding of the effect of every variable in the model. Confounding is a causal concept that does not apply to associations...By contrast, in a causal analysis, we need to think carefully about what variables can be confounders so that the parameter estimates for treatment or exposure can be causally interpreted. Automatic variable selection procedures may work for prediction, but not necessarily for causal inference. Selection algorithms that do not incorporate sufficient subject matter knowledge may select variables that introduce bias in the effect estimate, and ignoring the causal structure of the problem may lead to apparent paradoxes."</i><br /><br /><b>It all comes down to being a question of identification....or why AI has a long way to go in the causal space...or as Angrist and Pischke would put it....if applied econometrics were easy theorists would do it:</b><br /><br /><i>"Associational inference (prediction)or causal inference (counterfactual prediction)? The answer to this question has deep implications for (1) how we design the observational analysis to emulate a particular target trial and (2) how we choose confounding adjustment variables. Each causal question corresponds to a different target trial, may require adjustment for a different set of confounders, and is amenable to different types of sensitivity analyses. It then makes sense to publish separate articles for various causal questions based on the same data."</i><br /><br />I really liked how they phrased 'prediction' in terms of distinctly being associational or prospective vs. counterfactual. Also, what a nice way to think about 'identification' being about how we emulate a particular trial and handle confounding/selection bias/endogneity.<br /><br /><b>Reference:</b><br /><br />Miguel A. Hernán, “The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data”, American Journal of Public Health 108, no. 5 (May 1, 2018): pp. 616-619.<br /><br /><b>See also:</b><br /><br /><a href="http://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html">Will there be a credibility revolution in data science and AI?</a><br /><br /><a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">To Explain or Predict?</a>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-27375590772657206992018-03-18T11:06:00.001-04:002018-03-19T20:31:25.606-04:00Will there be a credibility revolution in data science and AI? <i>Summary: Understanding where AI and automation are going to be the most disruptive to data scientists in the near term relates to understanding methodological differences between explaining and predicting, between machine learning and causal inference. It will require the ability to ask a different kind of question than machine learning algorithms are capable of answering off of the shelf today.</i><br /><div><br /></div>There is a lot of enthusiasim about the disruptive role of automation and AI in data science. Products like <a href="https://www.h2o.ai/">H20ai </a>and <a href="https://www.datarobot.com/">DataRobot</a> offer tools to automate or fast track many aspects of the data science work stream. If this trajectory continues, what will the work of the future data scientist look like?<br /><br />Many have already pointed out the very difficult task of automating the <a href="https://www.superdatascience.com/podcast-power-soft-skills-data-science/">soft skills </a>possessed by data scientists. In a previous <a href="https://www.linkedin.com/pulse/what-traders-know-future-data-science-matt-bogard/">LinkedIn post</a> I discussed this in the trading space where automation and AI could create substantial disruptions for both data scientists and traders. Here I quoted Matthew Hoyle:<br /><br /><i>"Strategies have a short shelf life-what is valuable is the ability and energy to look at new and interesting things and put it all together with a sense of business development and desire to explore"</i><br /><br />My conclusion: <i>They are talking about bringing a portfolio of useful and practical skills together to do a better job than was possible before open source platforms and computing power became so proliferate. I think that is the future.</i><br /><br />So the future is about rebalancing the data scientists portfolio of skills. However, in the near term I think the disruption from AI and automation in data science will do more than increase the emphasis on soft skills. In fact there will remain a significant portion of 'hard skills' that will see an increase in demand because of the difficulty of automation.<br /><br />Understanding this will depend largely on making a distinction between <a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">explaining and predicting.</a> Much of what appears to be at the forefront of automation involves tasks supporting supervised and unsupervised machine learning algorithms as well as other prediction and forecasting tools like time series analysis.<br /><br />Once armed with predictions, businesses will start to ask questions about 'why'. This will transcend prediction or any of the visualizations of the patterns and relationships coming out of black box algorithms. They will want to know what decisions or factors are moving the needle on revenue or customer satisfaction and engagement or improved efficiencies. Essentially they will want to ask questions related to causality, which requires a completely different paradigm for data analysis than questions of prediction. And they will want scientifically formulated answers that are convincing vs. mere reports about rates of change or correlations. There is a significant difference between understanding what drivers correlate with or 'predict' the outcome of interest and what is actually driving the outcome. What they will be asking for is a <a href="http://econometricsense.blogspot.com/2017/07/the-credibility-revolution-in.html">credibility revolution </a>in data science.<br /><br />What do we mean by a credibility revolution?<br /><br />Economist <a href="http://jaysonlusk.com/blog/2016/5/12/does-diet-coke-cause-fat-babies">Jayson Lusk</a> puts it well:<br /><br /><i>"Fortunately economics (at least applied microeconomics) has undergone a bit of credibility revolution. If you attend a research seminar in virtually any economi(cs) department these days, you're almost certain to hear questions like, "what is your identification strategy?" or "how did you deal with endogeneity or selection?" In short, the question is: how do we know the effects you're reporting are causal effects and not just correlations."</i><br /><br />Healthcare Economist <a href="http://theincidentaleconomist.com/wordpress/what-took-con-econometrics/">Austin Frakt</a> has a similar take:<br /><br /><i>"A “research design” is a characterization of the logic that connects the data to the causal inferences the researcher asserts they support. It is essentially an argument as to why someone ought to believe the results. It addresses all reasonable concerns pertaining to such issues as selection bias, reverse causation, and omitted variables bias. In the case of a randomized controlled trial with no significant contamination of or attrition from treatment or control group there is little room for doubt about the causal effects of treatment so there’s hardly any argument necessary. But in the case of a natural experiment or an observational study causal inferences must be supported with substantial justification of how they are identified. Essentially one must explain how a random experiment effectively exists where no one explicitly created one."</i><br /><br />How are these questions and differences unlike your typical machine learning application? Susan Athey does a great job explaining in a Quora response about how causal inference is different from off the shelf machine learning methods (the kind being automated today):<br /><br /><i>"Sendhil Mullainathan (Harvard) and Jon Kleinberg with a number of coauthors have argued that there is a set of problems where off-the-shelf ML methods for prediction are the key part of important policy and decision problems. They use examples like deciding whether to do a hip replacement operation for an elderly patient; if you can predict based on their individual characteristics that they will die within a year, then you should not do the operation...Despite these fascinating examples, in general ML prediction models are built on a premise that is fundamentally at odds with a lot of social science work on causal inference. The foundation of supervised ML methods is that model selection (cross-validation) is carried out to optimize goodness of fit on a test sample. A model is good if and only if it predicts well. Yet, a cornerstone of introductory econometrics is that prediction is not causal inference.....Techniques like instrumental variables seek to use only some of the information that is in the data – the “clean” or “exogenous” or “experiment-like” variation in price—sacrificing predictive accuracy in the current environment to learn about a more fundamental relationship that will help make decisions...This type of model has not received almost any attention in ML."</i><br /><br />Developing an identification strategy, as Jayson Lusk discussed above, and all that goes along with that (finding natural experiments or valid instruments, or <a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">navigating the garden of forking paths</a> related to propensity score matching or a number of other <a href="http://econometricsense.blogspot.com/2013/09/causal-inference-and-quasi-experimental.html">quasi-experimental methods</a>) involves careful considerations and decisions to be made and defended in ways that would be very challenging to automate. Even when human's do this there is rarely a single best approach to these problems. They are far from routine. Just ask anyone that has been through peer review or given a talk at an economics seminar or conference.<br /><br />The kinds of skills required to work in this space would be similar to those of the econometrician or epidemiologist or any quantitative researcher that has been culturally immersed in the social norms and practices that have evolved out of the credibility revolution.. as data science thought leader <a href="https://www.superdatascience.com/podcast-one-purpose-data-science-truth-analytics/">Eugene Dubossarsky puts it</a>:<br /><br /><i>“the most elite skills…the things that I find in the most elite data scientists are the sorts of things econometricians these days have…bayesian statistics…inferring causality” </i><br /><br />Noone has a crystal ball. It is not to say that the current advances in automation are falling short on creating value. They should no doubt create value like any other form of capital complementing the labor and soft skills of the data scientist. And they could free up more resources to focus on more causal questions that previously may not have been answered. I discussed this complementarity previously in a <a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html">related post</a>:<br /><br /><i> "correlations or 'flags' from big data might not 'identify' causal effects, but they are useful for prediction and might point us in directions where we can more rigorously investigate causal relationships if interested" </i><br /><br />However, if automation in this space is possible, it will require a different approach than what we have seen so far. We might look to the pioneering work that Susan Athey is doing converging machine learning and causal inference. It will require thinking in terms of potential outcomes, endogeniety, and counterfactuals which requires the ability to ask a different kind of question than machine learning algorithms are capable of answering off of the shelf today.<br /><br /><b>Additional References:</b><br /><br />From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making <a href="https://sites.google.com/view/causalnips2017">https://sites.google.com/view/causalnips2017</a><br /><br />Susan Athey on Machine Learning, Big Data, and Causation <a href="http://www.econtalk.org/archives/2016/09/susan_athey_on.html">http://www.econtalk.org/archives/2016/09/susan_athey_on.html </a><br /><br />Machine Learning and Econometrics (Susan Athey, Guido Imbens) <a href="https://www.aeaweb.org/conference/cont-ed/2018-webcasts">https://www.aeaweb.org/conference/cont-ed/2018-webcasts </a><br /><br /><b>Related Posts:</b><br /><b><br /></b>Why Data Science Needs Economics<br /><a href="http://econometricsense.blogspot.com/2016/10/why-data-science-needs-economics.html">http://econometricsense.blogspot.com/2016/10/why-data-science-needs-economics.html</a><br /><br />To Explain or Predict<br /><a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html</a><br /><br />Culture War: Classical Statistics vs. Machine Learning: <a href="http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html">http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html </a><br /><br />HARK! - flawed studies in nutrition call for credibility revolution -or- HARKing in nutrition research <a href="http://econometricsense.blogspot.com/2017/12/hark-flawed-studies-in-nutrition-call.html">http://econometricsense.blogspot.com/2017/12/hark-flawed-studies-in-nutrition-call.html</a><br /><br />Econometrics, Math, and Machine Learning<br /><a href="http://econometricsense.blogspot.com/2015/09/econometrics-math-and-machine.html">http://econometricsense.blogspot.com/2015/09/econometrics-math-and-machine.html</a><br /><br />Big Data: Don't Throw the Baby Out with the Bathwater<br /><a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html">http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html</a><br /><br />Big Data: Causality and Local Expertise Are Key in Agronomic Applications<br /><a href="http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html">http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html</a><br /><br />The Use of Knowledge in a Big Data Society II: Thick Data<br /><a href="https://www.linkedin.com/pulse/use-knowledge-big-data-society-ii-thick-matt-bogard/">https://www.linkedin.com/pulse/use-knowledge-big-data-society-ii-thick-matt-bogard/ </a><br /><br />The Use of Knowledge in a Big Data Society<br /><a href="https://www.linkedin.com/pulse/use-knowledge-big-data-society-matt-bogard/">https://www.linkedin.com/pulse/use-knowledge-big-data-society-matt-bogard/ </a><br /><br />Big Data, Deep Learning, and SQL<br /><a href="https://www.linkedin.com/pulse/deep-learning-regressionand-sql-matt-bogard/">https://www.linkedin.com/pulse/deep-learning-regressionand-sql-matt-bogard/</a><br /><br />Economists as Data Scientists<br /><a href="http://econometricsense.blogspot.com/2012/10/economists-as-data-scientists.html">http://econometricsense.blogspot.com/2012/10/economists-as-data-scientists.html </a>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-15314881679110651492018-02-13T07:25:00.000-05:002018-02-13T10:30:17.152-05:00Intuition for Random EffectsPreviously I wrote a <a href="http://econometricsense.blogspot.com/2014/04/intuition-for-fixed-effects.html">post</a> based on course notes from J.Blumenstock that attempted to provide some intuition for how fixed effects estimators can account for <a href="http://econometricsense.blogspot.com/2013/06/unobserved-heterogeneity-and-endogeneity.html">unobserved heterogeneity</a> (individual specific effects).<br /><br />Recently someone asked if I could provide a similarly motivating and intuitive example regarding random effects. Although I was not able to come up with a new example, I can definitely discuss random effects in the same context of the previous example. But first a little (less intuitive) background.<br /><br /><b>Background</b><br /><br />To recap, the purpose of both fixed and random effects estimators is to model treatment effects in the face of unobserved individual specific effects.<br /><br /><span style="font-family: "calibri";">y<sub>it</sub> =</span><span style="font-family: "symbol";">b</span><span style="font-family: "calibri";"> x<sub>it</sub> + </span>α<span style="font-family: "symbol";"></span><sub><span style="font-family: "calibri";">i</span></sub><span style="font-family: "calibri";"> + u<sub>it </sub></span><span style="font-family: "calibri";">(1)</span><span style="font-family: "calibri";"> </span><br /><br />In the model above this is represented by α<span style="font-family: "symbol";"></span><sub><span style="font-family: "calibri";">i . </span></sub>In terms of estimation, the difference between fixed and random effects depends on how we choose to model this term. In the context of fixed effects it can be captured through a dummy variable estimation (this creates different intercepts or shifts capturing specific effects) or by transforming the data, subtracting group (fixed effects) means from individual observations within each group. In random effects models, individual specific effects are captured by a composite error term (α<span style="font-family: "symbol";"></span><sub><span style="font-family: "calibri";">i</span></sub><span style="font-family: "calibri";"> + u<sub>it</sub></span>) which assumes that individual intercepts are drawn from a random distribution of possible intercepts. The random component of the error term α<span style="font-family: "symbol";"></span><sub><span style="font-family: "calibri";">i</span></sub><span style="font-family: "calibri";"> captures the individual specific effects in a different way from fixed effects models. </span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";">As noted in another post, <a href="http://econometricsense.blogspot.com/2011/01/mixed-fixed-and-random-effects-models.html">Fixed, Mixed, and Random Effects</a>, t</span><span style="font-family: "calibri";">he random effects model is estimated using Generalized Least Squares (GLS) :</span><br /><div class="MsoNormal"><br /></div><div class="MsoNormal">β<span style="font-family: "calibri";"><sub>GLS</sub> = (X’</span>Ω<sup><span style="font-family: "calibri";">-1</span></sup><span style="font-family: "calibri";">X)<sup>-1</sup>(X’</span>Ω<sup><span style="font-family: "calibri";">-1</span></sup><span style="font-family: "calibri";">Y) where </span>Ω<span style="font-family: "calibri";"> = I </span>⊗<span style="font-family: "calibri";"> </span>Σ<span style="font-family: "calibri";"> </span>(2) </div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "calibri";">Where </span>Σ is the variance α<sub><span style="font-family: "calibri";">i</span></sub><span style="font-family: "calibri";">+ u<sub>it</sub> </span>. <span style="font-family: "calibri";">If </span>Σ<span style="font-family: "calibri";"> is unknown, it is estimated, producing a feasible generalized least squares estimate </span>β<sub><span style="font-family: "calibri";">FGLS</span></sub></div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "calibri";"><b>Intuition for Random Effects</b></span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";">In my post <a href="http://econometricsense.blogspot.com/2014/04/intuition-for-fixed-effects.html">Intuition for Fixed Effects</a> I noted: </span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";"><i>"Essentially using a dummy variable in a regression for each city (or group, or type to generalize beyond this example) holds constant or 'fixes' the effects across cities that we can't directly measure or observe. Controlling for these differences removes the 'cross-sectional' variation related to unobserved heterogeneity (like tastes, preferences, other unobserved individual specific effects). The remaining variation, or 'within' variation can then be used to 'identify' the causal relationships we are interested in."</i></span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";">Lets look at the toy data I used in that example. </span><br /><span style="font-family: "calibri";"><br /></span><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-JjBLoMc0ICc/U0qOUqZ99WI/AAAAAAAAAoQ/zMp9lDaqppYkhRlq7I0gFdJJ9Xrue574gCPcBGAYYCw/s1600/PANEL%2BDATA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="194" data-original-width="731" height="84" src="https://2.bp.blogspot.com/-JjBLoMc0ICc/U0qOUqZ99WI/AAAAAAAAAoQ/zMp9lDaqppYkhRlq7I0gFdJJ9Xrue574gCPcBGAYYCw/s320/PANEL%2BDATA.png" width="320" /></a></div><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";"><br /></span><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-gGRoWBIp4t0/WoLt1tatOQI/AAAAAAAAD_o/KiVUlbTErCAyYXwZyRGl4V5KxNN8Vc98ACLcBGAs/s1600/image001.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="245" data-original-width="353" height="222" src="https://4.bp.blogspot.com/-gGRoWBIp4t0/WoLt1tatOQI/AAAAAAAAD_o/KiVUlbTErCAyYXwZyRGl4V5KxNN8Vc98ACLcBGAs/s320/image001.png" width="320" /></a></div><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";"><br /></span>The crude ellipses in the plots above (motivated by the example given in Kennedy, 2008) indicate the data for each city and the the 'within' variation exploited by fixed effects models (<a href="http://econometricsense.blogspot.com/2014/04/intuition-for-fixed-effects.html">that allowed us to correctly identify the correct price/quantity relationships expected in the previous post</a>). The differences between the ellipses represents 'between variation.' As Kennedy discusses, random effects models differ from fixed effects models in that they are able to exploit both 'within' and 'between' variation, producing an estimate that is a weighted average of both kinds of variation (via Σ in equation 2 above). OLS, on the other hand exploits both kinds of variation as an unweighted average.<br /><br /><b>More Details </b><br /><br />As Kennedy discusses, both FE and RE can be viewed as running OLS on different transformations of the data.<br /><br />For fixed effects:<i> "this transformation consists of subtracting from each observation the average of the values within its ellipse"</i><br /><br />For random effects: <i>"the EGLS (or FGLS above) calculation is done by finding a transformation of the data that creates a spherical variance-covariance matrix and then performing OLS on the transformed data."</i><br /><br />As Kennedy notes, the increased information used by RE makes them more efficient estimators, but correlation between 'x' and the error term creates bias. i.e. RE assumes that α<sub><span style="font-family: "calibri";">i </span></sub>is uncorrelated with (orthogonal to) regressors. Angrist and Pischke (2009) discuss (footnote, p. 223) that they prefer FE because the gains in efficiency are likely to be modest while the finite sample properties of RE may be worse. As noted on p.243 an important assumption for identification in FE is that the most important sources of variation are time invariant (because information from time varying regressors gets differenced out). Angrist and Pischke also have a nice discussion on page 244-245 discussing the choice between FE and lagged dependent variable models.<br /><br /><b>References:</b><br /><br />A Guide to Econometrics. Peter Kennedy. 6th Edition. 2008<br />Mostly Harmless Econometrics. Angrist and Pischke. 2009<br /><br /><span style="font-family: "calibri";">See also: <a href="http://marcfbellemare.com/wordpress/12335">‘Metrics Monday: Fixed Effects, Random Effects, and (Lack of) External Validity (Marc Bellemare.</a></span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";">Marc notes: </span><br /><span style="font-family: "calibri";"><br /></span><i><span style="font-family: "calibri";">"Nowadays, in the wake of the Credibility Revolution, what we teach students is: “You should use RE when your variable of interest is orthogonal to the error term; if there is any doubt and you think your variable of interest is not orthogonal to the error term, use FE.” </span><span style="font-family: "calibri";">And since the variable can be argued to be orthogonal pretty much only in cases where it is randomly assigned in the context of an experiment, experimental work is pretty much the only time the RE estimator should be used."</span></i><br /><span style="font-family: "calibri";"><br /></span></div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-55369393461702618002018-02-02T21:25:00.001-05:002018-02-03T18:04:21.978-05:00Deep Learning vs. Logistic Regression ROC vs Calibration Explaining vs. PredictingFrank Harrel writes <a href="http://www.fharrell.com/post/medml/">Is Medicine Mesmerized by Machine Learning? </a>Some time ago I wrote about predictive modeling and the differences between <a href="http://econometricsense.blogspot.com/2013/04/is-roc-curve-good-metric-for-model.html">what the ROC curve may tell us and how well a model 'calibarates.'</a><br /><br />There I quoted from the journal <i>Circulation</i>:<br /><br /><i>'When the goal of a predictive model is to categorize individuals into risk strata, the assessment of such models should be based on how well they achieve this aim...The use of a single, somewhat insensitive, measure of model fit such as the c statistic can erroneously eliminate important clinical risk predictors for consideration in scoring algorithms'</i><br /><br />Not too long ago Dr. Harrel shares the following tweet related to this:<br /><br /><i>I have seen hundreds of ROC curves in the past few years. I've yet to see one that provided any insight whatsoever. They reverse the roles of X and Y and invite dichotomization. Authors seem to think they're obligatory. Let's get rid of 'em.</i> <a href="https://twitter.com/f2harrell">@f2harrell </a>8:42 AM - 1 Jan 2018<br /><br />In his Statistical Thinking post above, Dr. Harrel writes:<br /><br /><i>"Like many applications of ML where few statistical principles are incorporated into the algorithm, the result is a failure to make accurate predictions on the absolute risk scale. The calibration curve is far from the line of identity as shown below...The gain in c-index from ML over simpler approaches has been more than offset by worse calibration accuracy than the other approaches achieved."</i><br /><br />i.e. depending on the goal, better ROC scores don't necessarily mean better models.<br /><br />But this post was about more than discrimination and calibration. It was discussing the logistic regression approach taken in <a href="http://www.amjmed.com/article/S0002-9343(09)00103-X/pdf">Exceptional Mortality Prediction by Risk Scores from Common Laboratory Tests</a> vs the deep learning approach used in <a href="https://arxiv.org/abs/1711.06402">Improving Palliative Care with Deep Learning.</a><br /><br /><i>"One additional point: the ML deep learning algorithm is a black box, not provided by Avati et al, and apparently not usable by others. And the algorithm is so complex (especially with its extreme usage of procedure codes) that one can’t be certain that it didn’t use proxies for private insurance coverage, raising a possible ethics flag. In general, any bias that exists in the health system may be represented in the EHR, and an EHR-wide ML algorithm has a chance of perpetuating that bias in future medical decisions. On a separate note, I would favor using comprehensive comorbidity indexes and severity of disease measures over doing a free-range exploration of ICD-9 codes."</i><br /><br />This kind of pushes back against the idea that deep neural nets can effectively bypass feature engineering, or at least raises cautions in specific contexts.<br /><br />Actually, he is not as critical of the authors of this paper as he is about what he considers undue accolades it has received.<br /><br />This ties back to my post on LinkedIn a couple weeks ago, <a href="https://www.linkedin.com/pulse/deep-learning-regressionand-sql-matt-bogard/">Deep Learning, Regression, and SQL. </a><br /><br /><b>See also:</b><br /><br /><a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">To Explain or Predict</a><br /><a href="http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html">Big Data: Causality and Local Expertise Are Key in Agronomic Applications</a><br /><br /><b>And: </b><br /><br /><a href="https://www.ibm.com/developerworks/community/blogs/jfp/entry/Feature_Engineering_For_Deep_Learning?lang=en">Feature Engineering for Deep Learning</a><br /><a href="http://smerity.com/articles/2016/architectures_are_the_new_feature_engineering.htm">In Deep Learning, Architecture Engineering is the New Feature Engineering</a><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-30128598736247501572017-12-31T15:48:00.004-05:002017-12-31T15:49:03.511-05:00HARK! - flawed studies in nutrition call for credibility revolution -or- HARKing in nutrition researchThere was a nice piece over at the Genetic Literacy Project I read just recently: <i>Why so many scientific studies are flawed and poorly understood</i>. (<a href="https://geneticliteracyproject.org/2017/12/13/viewpoint-many-scientific-studies-flawed-poorly-understood/">link</a>). They gave a fairly intuitive example of false positives in research using coin flips. I like this because I used the specific example of flipping a coin 5 times in a row to demonstrate basic probability concepts in some of the stats classes I used to teach. Their example might make a nice extension:<br /><br /><i>"In Table 1 we present ten 61-toss sequences. The sequences were computer generated using a fair 50:50 coin. We have marked where there are runs of five or more heads one after the other. In all but three of the sequences, there is a run of at least five heads. Thus, a sequence of five heads has a probability of 0.55=0.03125 (i.e., less than 0.05) of occurring. Note that there are 57 opportunities in a sequence of 61 tosses for five consecutive heads to occur. We can conclude that although a sequence of five consecutive heads is relatively rare taken alone, it is not rare to see at least one sequence of five heads in 61 tosses of a coin."</i><br /><br />In other words, a 5 head run in a sequence of 61 tosses (as evidence against a null hypothesis of p(head) = .5 i.e. a fair coin) is their analogy for a false positive in research. Particularly they relate this to nutrition research where it is popular to use large survey questionnaires that consist of a large number of questions:<br /><br /><i>"asking lots of questions and doing weak statistical testing is part of what is wrong with the self-reinforcing publish/grants business model. Just ask a lot of questions, get false-positives, and make a plausible story for the food causing a health effect with a p-value less than 0.05"</i><br /><br />It is their 'hypothesis' that this approach in conjunction with a questionable practice referred to as 'HARKing' (hypothesizing after the results are known) is one reason we see so many conflicting headlines about what we should and should not eat or benefits or harms of certain foods and diets. There is some damage done in terms of peoples' trust in science as a result. They conclude:<br /><br /><i>"Curiously, editors and peer-reviewers of research articles have not recognized and ended this statistical malpractice, so it will fall to government funding agencies to cut off support for studies with flawed design, and to universities to stop rewarding the publication of bad research. We are not optimistic."</i><br /><br />More on HARKing.....<br /><br />A good article related to HARKing is a paper written by Norbert L. Kerr. By HARKing he specifically discusses it as the practice of proposing one hypothesis (or set of hypotheses) but later changing the research question *after* the data is examined. Then presenting the results *as if* the new hypothesis were the original. He does distinguish this from a more intentional exercise in scientific induction, inferring some relation or principle post hoc from a pattern of data. This is more like exploratory data analysis.<br /><br />I discussed exploratory studies and issues related to multiple testing in a previous post: <a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">Econometrics, Multiple Testing, and Researcher Degrees of Freedom. </a><br /><br />To borrow a quote from this post- "<i>At the same time, we do not want demands of statistical purity to strait-jacket our science. The most valuable statistical analyses often arise only after an iterative process involving the data"</i> (see, e.g., Tukey, 1980, and Box, 1997).<br /><br />To say the least, careful consideration of tradeoffs should be made in the way research is conducted, and as the post discusses in more detail, the <i>garden of forking paths </i>involved.<br /><br />I am not sure to what extent the <a href="http://econometricsense.blogspot.com/2017/07/the-credibility-revolution-in.html">credibility revolution</a> has impacted nutrition studies, but the lessons apply here.<br /><br /><b>References:</b><br /><br />HARKing: Hypothesizing After the Results are Known<span style="white-space: pre;"> </span><br />Norbert L. Kerr<br />Personality and Social Psychology Review<br />Vol 2, Issue 3, pp. 196 - 217<br />First Published August 1, 1998Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-57843006034924578562017-08-24T18:56:00.000-04:002017-08-24T19:58:29.809-04:00Granger Causality<i>"Granger causality is a standard linear technique for determining whether one time series is useful in forecasting another." </i>(Irwin and Sanders, 2011).<br /><br />A series 'granger' causes another series if it consistently predicts it. If series X granger causes Y, while we can't be certain that this relationship is causal in any rigorous way, we might be fairly certain that Y doesn't cause X.<br /><br />Example:<br /><br />Yt = B0 + B1*Yt-1 +... Bp*Yt-p + A2*Xt-1+.....+Ap*Xt-p + Et<br /><br />if we reject the hypothesis that all the 'A' coefficients jointly = 0 then 'X' granger causes 'Y'<br /><br />Xt = B0 + B1*Xt-1 +... Bp*Xt-p + A2*Yt-1+.....+Ap*Yt-p + Et<br /><br />if we reject the hypothesis that all the 'A' coefficients jointly = 0 then 'Y' granger causes 'X'<br /><br /><b>Applications:</b><br /><br />Below are some applications where granger causality methods were used to test the impacts of index funds on commodity market price and volatility.<br /><br />The Impact of Index Funds in Commodity Futures Markets:A Systems Approach<br />DWIGHT R. SANDERS AND SCOTT H. IRWIN<br />The Journal of Alternative Investments<br />Summer 2011, Vol. 14, No. 1: pp. 40-49<br /><br />Irwin, S. H. and D. R. Sanders (2010), “The Impact of Index and Swap Funds on Commodity Futures Markets: Preliminary Results”, OECD Food, Agriculture and Fisheries Working Papers, No. 27, OECD Publishing. doi: 10.1787/5kmd40wl1t5f-en<br /><br />Index Trading and Agricultural Commodity Prices:<br />A Panel Granger Causality Analysis<br />Gunther Capelle-Blancard and Dramane Coulibaly<br />CEPII, WP No 2011 – 28<br />No 2011 – 28<br />December<br /><br /><b>References:</b><br /><br />Using Econometrics: A Practical Guide (6th Edition) A.H. Studenmund. 2011<br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-66521274250437838972017-08-07T06:27:00.000-04:002017-08-08T20:59:30.817-04:00Confidence Intervals: Fad or Fashion <div style="margin-bottom: 0in;">Confidence intervals seem to be the fad among some in pop stats/data science/analytics. Whenever there is mention of p-hacking, or the ills of publication standards, or the pitfalls of null hypothesis significance testing, CIs almost always seem to be the popular solution.<br /><br /></div><div style="margin-bottom: 0in;">There are some attractive features of CIs. <a href="https://www.dgps.de/fachgruppen/methoden/mpr-online/issue7/art2/brandstaetter.pdf">This paper</a> provides some alternative views of CIs, discusses some strengths and weaknesses, and ultimately proposes that they are on balance superior to p-values and hypothesis testing. CIs can bring more information to the table in terms of effect sizes for a given sample however some of the statements made in this article need to be read with caution. I just wonder how much the fascination with CIs is largely the result of confusing a <a href="http://econometricsense.blogspot.com/2015/01/overconfident-confidence-intervals.html">Bayesian interpretation with a frequentist application</a> or just sloppy misinterpretation. I completely disagree that they are more straight forward to students (compared to interpreting hypothesis tests and p-values as the article claims).</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><a href="http://davegiles.blogspot.com/2011/08/overly-confident-future-nobel-laureate.html">Dave Giles</a> gives a very good review starting with the very basics of what is a parameter vs. an estimator vs. an estimate, sampling distributions etc. After reviewing the concepts key to understanding CIs he points out two very common interpretations of CIs that are clearly wrong:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>1) There's a 95% probability that the true value of the regression coefficient lies in the interval [a,b].</i></div><div style="margin-bottom: 0in;"><i>2) This interval includes the true value of the regression coefficient 95% of the time.</i></div><div style="margin-bottom: 0in;"><i><br /></i></div><div style="margin-bottom: 0in;"><i>"we really should talk about the (random) intervals "covering" the (fixed) value of the parameter. If, as some people do, we talk about the parameter "falling in the interval", it sounds as if it's the parameter that's random and the interval that's fixed. Not so!"</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">In <i>Robust misinterpretation of confidence intervals,</i> the authors take on the idea that confidence intervals offer a panacea for interpretation issues related to null hypothesis significance testing (NHST):<br /><br /><i>"Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual...Our findings suggest that many researchers do not know the correct interpretation of a CI....As is the case with p-values, CIs do not allow one to make probability statements about parameters or hypotheses."</i><br /><i><br /></i>The authors present evidence about this misunderstanding by presenting subjects with a number of false statements regarding confidence intervals (including the two above pointed out by Dave Giles) and noting the frequency of incorrect affirmations about their truth.<br /><br />In <i>Osteoarthritis and Cartilage</i>, authors write:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>"In spite of frequent discussions of misuse and misunderstanding of probability values (P-values) they still appear in most scientific publications, and the disadvantages of erroneous and simplistic P-value interpretations grow with the number of scientific publications."</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">They raise a number of issues related to both p-values and confidence intervals (multiplicity of testing, the focus on effect sizes, etc.) and they point out some informative differences between using p-values vs. using standard errors to produce 'error bars.' However, in trying to clarify the advantages of p-values they step really close to what might be considered an erroneous and simplistic interpretation:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>"the great advantage with confidence intervals is that they do show what effects are likely to exist in the population. Values excluded from the confidence interval are thus not likely to exist in the population. "</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">Maybe I am being picky, but if we are going to be picky about interpreting p-values then the same goes for CIs. It sounds a lot like they are talking about 'a parameter falling into an interval' or the 'probability of a parameter falling into an interval' as Dave cautions against. They seem careful enough in their language using the term 'likely' vs. making strong probability statements, so maybe they are making a more heuristic interpretation that while useful may not be the most correct. </div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">In <i>Mastering 'Metrics,</i> Angrist and Pishcke give a great interpretation of confidence intervals that doesn't lend itself in my opinion as easily to abusive probability interpretations:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>"By describing a set of parameter values consistent with our data, confidence intervals provide a compact summary of the information these data contain about the population from which they were sampled"</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">I think the authors <i>Osteoarthritis and Cartilage</i> could have stated their case better if they had said:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>"The great advantage of confidence intervals is that they describe what effects in the population are consistent with our sample data. Our sample data is not consistent with population effects excluded from the confidence interval."</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">Both hypothesis testing and confidence intervals are statements about the compatibility of our observable sample data with population characteristics of interest. <a href="http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108">The ASAreleased a set of clarifications on statements on p-values. </a>Number 2 states that <i>"P-values do not measure the probability that the studied hypothesis is true."</i> Nor does a confidence interval (again see Ranstan, 2014).<br /><br />Venturing into the risky practice of making imperfect analogies, take this loosely from the perspective of criminal investigations. We might think of confidence intervals as narrowing the range of suspects based on observed evidence, without providing specific probabilities related to the guilt or innocence of any particular suspect. Better evidence narrows the list, just as better evidence in our sample data (less noise) will narrow the confidence interval.</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">I see no harm in CIs and more good if they draw more attention to practical/clinical significance of effect sizes. But I think the temptation to incorrectly represent CIs can be just as strong as the temptation to speak boldly of 'significant' findings following an exercise in p-hacking or in the face of meaningless effect sizes. Maybe some sins are greater than others and proponents feel more comfortable with misinterpretations/overinterpretations of CIs than they do with misinterpretations/overinterpretaions of p-values.</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">Or as <a href="http://wmbriggs.com/post/11862/">Briggs concludes </a>about this issue:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>"Since no frequentist can interpret a confidence interval in any but in a logical probability or Bayesian way, it would be best to admit it and abandon frequentism"</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><b>See also: </b><br /><a href="http://andrewgelman.com/2014/12/11/fallacy-placing-confidence-confidence-intervals/">Andrew Gelman: The Fallacy of Placing Confidence in Confidence Intervals.</a><br /><a href="http://noahpinionblog.blogspot.com/2015/08/the-backlash-to-backlash-against-p.html">Noah Smith: The Backlash to the Backlash Against P-values</a><br /><b><br /></b><b>References:</b></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">Methods of Psychological Research Online 1999, Vol.4, No.2 © 1999 PABST SCIENCE PUBLISHERS Confidence Intervals as an Alternative to Significance Testing Eduard Brandstätter1 Johannes Kepler Universität Linz</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">J. Ranstam, Why the -value culture is bad and confidence intervals a better alternative, Osteoarthritis and Cartilage, Volume 20, Issue 8, 2012, Pages 805-808, ISSN 1063-4584, http://dx.doi.org/10.1016/j.joca.2012.04.001 (http://www.sciencedirect.com/science/article/pii/S1063458412007789)<br /><br />Robust misinterpretation of confidence intervals<br />Rink Hoekstra & Richard D. Morey & Jeffrey N. Rouder &<br />Eric-Jan Wagenmakers Psychon Bull Rev<br />DOI 10.3758/s13423-013-0572-3 2014</div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-47325601579728890672017-07-21T19:07:00.002-04:002017-07-21T19:18:04.684-04:00Regression as a variance based weighted average treatment effectIn <a href="http://www.mostlyharmlesseconometrics.com/">Mostly Harmless Econometrics</a> Angrist and Pischke discuss regression in the context of matching. Specifically they show that regression provides variance based weighted average of covariate specific differences in outcomes between treatment and control groups. Matching gives us a weighted average difference in treatment and control outcomes weighted by the empirical distribution of covariates. (see more here). I wanted to roughly sketch this logic out below.<br />
<br /><b>Matching</b><br />
<br /> δATE = E[y1i | Xi,Di=1] - E[y0i | Xi,Di=0] = ATE <br /><br />This gives us the average difference in mean outcomes for treatment and control (y1i,y0i ⊥ Di) i.e. in a randomized controlled experiment potential outcomes are independent from treatment status<br />
<br />We represent the matching estimator empirically by:<br /><br /> Σ δx P(Xi,=x) where δx is the difference in mean outcome values between treatment and control units at a particular value of X, or difference in outcome for a particular combination of covariates (y1,y0 ⊥ Di|xi) i.e. conditional independence assumed- hence identification is achieved through a selection on observables framework.<br />
<br />
Average differences δx are weighted by the distribution of covariates via the term P(Xi,=x).<br /><br /><b>Regression</b><br /><br />We can represent a regression parameter using the basic formula taught to most undergraduates:<br /><br />Single Variable: β = cov(y,D)/v(D)<br />Multivariable: βk = cov(y,D*)/v(D*) <br /><br />where D* = residual from regression of D on all other covariates and
E(X’X)-1E(X’y) is a vector with the kth element cov(y,x*)/v(x*) where x* is the residual from regression of that particular ‘x’ on all other covariates.<br /><br />We can then represent the estimated treatment effect from regression as:<br /><br /> δR = cov(y,D*)/v(D*) = E[(Di-E[Di|Xi])E[yiIDiXi] / E[(Di-E[Di|Xi])^2] assuming (y1,y0 ⊥ Di|xi)<br /><br />Again regression and matching rely on similar identification strategies based on selection on observables/conditional independence.<br /><br />Let E[yi | DiXi] = E[yi | Di =0,Xi] + δx Di<br /><br />Then with more algebra we get: δR = cov(y,D*)/v(D*) = E[σ^2D(Xi) δx]/ E[σ^2D(Xi)]<br /><br />where σ^2D(Xi) is the conditional variance of treatment D given X or E{E[(Di –E[Di|Xi])^2|Xi]}.<br /><br />While the algebra is cumbersome and notation heavy, we can see that the way most people are familiar with viewing a regression estimate cov(y,D*)/v(D*) is equivalent to the term (using expectations) E[σ2D(Xi) δx]/ E[σ2D(Xi)] , and we can see that this term contains the product of the conditional variance of D and our covariate specific differences in treatment and controls δx.<br /><br />Hence, regression gives us a variance based weighted average treatment effect, whereas matching provides a distribution weighted average treatment effect.<br /><br />So what does this mean in practical terms? Angrist and Piscke explain that regression puts more weight on covariate cells where the conditional variance of treatment status is the greatest, or where there are an equal number of treated and control units. They state that differences matter little when the variation of δx is minimal across covariate combinations.<br /><br />In his post <a href="http://hrisblattman.com/2010/10/27/the-cardinal-sin-of-matching/">The cardinal sin of matching</a>, Chris Blattman puts it this way:<br /><br /><i>"For causal inference, the most important difference between regression and matching is what observations count the most. A regression tries to minimize the squared errors, so observations on the margins get a lot of weight. Matching puts the emphasis on observations that have similar X’s, and so those observations on the margin might get no weight at all....Matching might make sense if there are observations in your data that have no business being compared to one another, and in that way produce a better estimate" </i><br /><br />Below is a very simple contrived example. Suppose our data looks like this:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-On_psQKxI-g/WXJZtY5szZI/AAAAAAAACgQ/CKBXoEkkJrIKWhZRdfqDtgBXYaoP048aQCLcBGAs/s1600/Regression%2Bvs%2BMatching.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="496" data-original-width="475" height="320" src="https://1.bp.blogspot.com/-On_psQKxI-g/WXJZtY5szZI/AAAAAAAACgQ/CKBXoEkkJrIKWhZRdfqDtgBXYaoP048aQCLcBGAs/s320/Regression%2Bvs%2BMatching.png" width="306" /></a></div>We can see that those in the treatment group tend to have higher outcome values so a straight comparison between treatment and controls will <a href="http://econometricsense.blogspot.com/2013/05/selection-bias-and-rubin-causal-model.html">overestimate treatment effects due to selection bias:</a><br /><br /> <span style="mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">E[Y<sub>i</sub>|d<sub>i</sub>=1] - E[Y<sub>i</sub>|d<sub>i</sub>=0] =E[Y<sub>1i</sub>-Y<sub>0i</sub>]<span style="mso-spacerun: yes;"> </span>+{E[Y<sub>0i</sub>|d<sub>i</sub>=1] - E[Y<sub>0i</sub>|d<sub>i</sub>=0]} </span><br /><br /> However, if we estimate differences based on an exact matching scheme, we get a much smaller estimate of .67. If we run a regression using all of the data we get .75. If we consider 3.78 to be biased upward then both matching and regression have significantly reduced it, and depending on the application the difference between .67 and .75 may not be of great consequence. Of course if we run the regression including only matched variables, we get exactly the same results. (see R code below). This is not so different than the method of <a href="http://econometricsense.blogspot.com/2015/03/using-r-matchit-package-for-propensity.html">trimming based on propensity scores </a>suggested in Angrist and Pischke.<br /><br /><br />Both methods rely on the same assumptions for identification, so noone can argue superiority of one method over the other with regard to identification of causal effects.<br /><br />Matching has the advantage of having a nonparametric, alleviating concerns with functional form. However, there are <a href="http://econometricsense.blogspot.com/2015/01/considerations-in-propensity-score.html">lots of considerations</a> to work through in matching (i.e. 1:1, 1:many, <a href="http://econometricsense.blogspot.com/2015/03/propensity-score-matching-optimal.html">optimal caliper width</a>, variance/bias tradeoff and kernel selection etc.). While all of these possibilities might lead to better estimates, I wonder if they don't sometimes lead to a <a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">garden of forking paths. </a><br /><br /><b>See also: </b><br /><br />For a neater set of notes related to this post, see:<br /><b> </b><br />Matt Bogard. "Regression and Matching (3).pdf" <em>Econometrics, Statistics, Financial Data Modeling</em> (2017). Available at: http://works.bepress.com/matt_bogard/37/ <b> </b><br /><br /><a href="http://econometricsense.blogspot.com/2015/03/using-r-matchit-package-for-propensity.html">Using R MatchIt for Propensity Score Matching</a><br /><br /><b>R Code:</b><br /><br /># generate demo data<br /><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">x <- c(4,5,6,7,8,9,10,11,12,1,2,3,<wbr></wbr>4,5,6,7,8,9)</div><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">d <- c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,<wbr></wbr>0,0,0,0)</div><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">y <- c(6,7,8,8,9,11,12,13,14,2,3,4,<wbr></wbr>5,6,7,8,9,10)</div><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><br /></div><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">summary(lm(y~x+d)) # regression controlling for x</div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-28968690898171212512017-07-12T21:05:00.000-04:002017-07-13T09:48:15.194-04:00Instrumental Variables and LATEOften in program evaluation we are interested in estimating the average treatment effect (ATE). This is in theory the effect of treatment on a randomly selected person from the population. This can be estimated in the context of a randomized controlled trial (RCT) by a comparison of means between treated and untreated participants.<br /><br />However, sometimes in a randomized experiment, some members selected for treatment may not actually receive treatment (if participation is voluntary, <a href="http://econometricsense.blogspot.com/2014/01/the-oregon-medicaid-experiment-applied.html">for example the Medicaid expansion in Oregon</a>). In this case, sometimes researchers will compare differences in outcome between those selected for treatment vs those assigned to control groups. This analysis, as assigned or as randomized, is referred to as an intent-to-treat analysis (ITT). With perfect compliance, ITT = ATE.<br /><br /><a href="http://econometricsense.blogspot.com/2017/06/instrumental-variables-vs-intent-to.html">As discussed previously,</a> using treatment assignment as an instrumental variable (IV) is another approach to estimating treatment effects. This is referred to as a local average treatment effect (LATE).<br /><br /><b>What is LATE and how does it give us an unbiased estimate of causal effects?</b><br /><br />In simplest terms, LATE is the ATE for the sub-population of compliers in an RCT (or other natural experiment where an instrument is used).<br /><br />In a randomized controlled trial you can characterize participants as follows: (<a href="http://egap.org/methods-guides/10-things-you-need-know-about-local-average-treatment-effect">see this reference from egap.org</a> for a really great primer on this)<br /><br /><b>Never Takers: </b>those that refuse treatment regardless of treatment/control assignment.<br /><br /><b>Always Takers: </b>those that get the treatment even if they are assigned to the control group.<br /><br /><b>Defiers: </b>Those that get the treatment when assigned to the control group and do not receive treatment when assigned to the treatment group. (these people violate an IV assumption referred to monotonicity)<br /><br /><b>Compliers:</b> those that comply or receive treatment if assigned to a treatment group but do not recieve treatment when assigned to control group. <br /><br />The outcome for never takers is the same regardless of treatment assignment and in effect cancel out in an IV analysis. As discussed by <a href="http://press.princeton.edu/titles/10363.html">Angrist and Pishke in Mastering Metrics</a>, the always takers are prime suspects for creating bias in non-compliance scenarios. These folks are typically the more motivated participants and likely would have higher potential outcomes or potentially have a greater benefit from treatment than other participants. The compliers are characterized as participants that receive treatment only as a result of random assignment. The estimated treatment effect for these folks is often very desirable and in an IV framework can give us an unbiased causal estimate of the treatment effect. This is what is referred to as a local average treatment effect or LATE.<br /><br /><b>How do we estimate LATE with IVs?</b><br /><br />One way LATE estimates are often described is as dividing the ITT effect by the share of compliers. This can also be done in a regression context. Let D be an indicator equal to 1 if treatment is received vs. 0, and let Z be our indicator (0,1) for the original randomization i.e. our instrumental variable. We first regress:<br /><br />D = β<sub>0</sub> + β<sub>1</sub> Z + e<span style="mso-tab-count: 1;"> </span><br /><br /><span style="mso-tab-count: 1;">This captures all of the variation in our treatment that is related to our instrument Z, or random assignment. This is<i> 'quasi-experimental'</i> variation. It is also an estimate of the rate of compliance. </span>β<sub>1</sub> only picks up the variation in treatment D that is related to Z and leaves all of the variation and unobservable factors related to self selection (i.e. bias) in the residual term.<span style="mso-spacerun: yes;"> </span>You can think of this as the filtering process. We can represent this as: COV(D,Z)/V(Z). <br /><br />Then, to relate changes in Z to changes in our target Y we estimate β<sub>2</sub> or COV(Y,Z)/V(Z).<br /><br /><div class="MsoNormal"></div><div class="MsoNormal">Y = β<sub>0</sub> +β<sub>2</sub> Z + e <span style="mso-tab-count: 1;"> </span><br /></div><div class="MsoNormal"></div><div class="MsoNormal">Our instrumental variable estimator then becomes:<br /></div><div class="MsoNormal"></div><div class="MsoNormal">β<sub>IV</sub> = β<sub>2</sub> / β<sub>1</sub><span style="mso-spacerun: yes;"> </span>or (Z’Z)<sup>-1</sup>Z’Y / (Z’Z)<sup>-1</sup>Z’D or COV(Y,Z)/COV(D,Z) <span style="mso-tab-count: 1;"></span></div><br />The last term gives us the total proportion of <i style="mso-bidi-font-style: normal;">‘quasi-experimental variation’</i> in D related to Y.<span style="mso-spacerun: yes;"> We can also view this through a 2SLS modeling strategy:</span><br /><br /><br /><div class="MsoNormal"><span style="font-family: "Times New Roman";">Stage 1: Regress D on Z to get D* or </span>D = β<sub>0</sub> + β<sub>1</sub> Z + e<span style="mso-tab-count: 1;"> </span></div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "Times New Roman";">Stage 2: Regress Y on D* or </span>Y = β<sub>0</sub> +β<sub>IV</sub> D* + e <span style="mso-tab-count: 1;"><br /></span></div><br /> As described in <a href="http://www.mostlyharmlesseconometrics.com/">Mostly Harmless Econometrics,</a> <span style="font-family: "Times New Roman";"><i>"Intuitively, conditional on covariates, 2SLS retains only the variation in s </i>[D in our example above] <i>that is generated by quasi-experimental variation- that is generated by the instrument z" </i></span><br /><br />Regardless of how you want to interpret β<sub>IV</sub>, we can see that it teases out only that variation in our treatment D that is unrelated to selection bias and relates it to Y giving us an estimate for the treatment effect of D that is less biased.<br /><br />The causal path can be represented as:<br /><br />Z →D→Y <span style="mso-tab-count: 1;"> </span><br /><span style="mso-tab-count: 1;"><br /></span><span style="mso-tab-count: 1;"><a href="http://econometricsense.blogspot.com/2015/11/instrumental-explanations-of.html">There are lots of other ways to think about how to interpret IVs.</a> Ultimately they provide us with an estiamate of the LATE which can be interpreted as an average causal effect of treatment for those participants in a study whose enrollment status is determined completely by Z (the treatment assignment) i.e. the compliers and this is often a very relevant effect of interest. </span><br /><br /><span style="mso-tab-count: 1;">Marc Bellemare has some really good posts related to this see <a href="http://marcfbellemare.com/wordpress/7174">here</a>, <a href="http://marcfbellemare.com/wordpress/7182">here, </a>and <a href="http://marcfbellemare.com/wordpress/7231">here.</a></span><br /><br /><span style="mso-tab-count: 1;"><br /></span>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-90227917604117082692017-07-11T18:04:00.000-04:002017-07-11T18:04:01.685-04:00The Credibility Revolution in EconometricsPreviously I wrote about how <a href="http://econometricsense.blogspot.com/2017/07/the-value-of-graduate-educationand.html">graduate training (and experience) can provide a foundation for understanding statistics, experimental design, and interpretation of research.</a> I think this is common across many master's and doctoral level programs. But some programs approach this a little differently than others. Because of the <a href="http://www.nber.org/papers/w15794">credibility revolution</a> in economics, there is a special concern for identification and robustness. And even within the discipline, there is concern that this has not been given enough emphasis in modern textbooks and curricula (see <a href="https://www.weforum.org/agenda/2015/05/why-econometrics-teaching-needs-an-overhaul/">here </a>and <a href="http://www.nber.org/papers/w23144?utm_campaign=ntw&utm_medium=email&utm_source=ntw">here</a>). However, this may not be well understood or appreciated by those outside the discipline. <br /><br /><b>What is the credibility revolution and what does it mean in terms of how we do research?</b><br /><br />I like to look at this through the lens of applied economists working in the field:<br /><br />Economist <a href="http://jaysonlusk.com/blog/2016/5/12/does-diet-coke-cause-fat-babies">Jayson Lusk</a> puts it well:<br /><br /><i>"Fortunately economics (at least applied microeconomics) has undergone a bit of credibility revolution. If you attend a research seminar in virtually any economist department these days, you're almost certain to hear questions like, "what is your identification strategy?" or "how did you deal with endogeneity or selection?" In short, the question is: how do we know the effects you're reporting are causal effects and not just correlations."</i><br /><br />Healthcare Economist <a href="http://theincidentaleconomist.com/wordpress/what-took-con-econometrics/">Austin Frakt has a similar take:</a><br /><br /><i>"A “research design” is a characterization of the logic that connects the data to the causal inferences the researcher asserts they support. It is essentially an argument as to why someone ought to believe the results. It addresses all reasonable concerns pertaining to such issues as selection bias, reverse causation, and omitted variables bias. In the case of a randomized controlled trial with no significant contamination of or attrition from treatment or control group there is little room for doubt about the causal effects of treatment so there’s hardly any argument necessary. But in the case of a natural experiment or an observational study causal inferences must be supported with substantial justification of how they are identified. Essentially one must explain how a random experiment effectively exists where no one explicitly created one."</i><br /><br /> How do we get substantial justification? Angrist and Pischke give a good example in their text <a href="http://www.mostlyharmlesseconometrics.com/">Mostly Harmless Econometrics </a>in their discussion of fixed effects and lagged dependent variables:<br /><br /><i>"One answer, as always is to check the robustness of your findings using alternative identifying assumptions. That means you would like to find broadly similar results using plausible alternative models." </i><br /><br />To someone trained in the physical or experimental sciences, this might 'appear' to look like data mining. But <a href="http://marcfbellemare.com/wordpress/11833">Marc Bellemare makes a strong case that it is not!</a><br /><br /><i>"Unlike experimental data, which often allow for a simple comparison of means between treatment and control groups, observational data require one to slice the data in many different ways to make sure that a given finding is not spurious, and that the researchers have not cherry-picked their findings and reported the one specification in which what they wanted to find turned out to be there. As such, all those tables of robustness checks are there to do the exact opposite of data mining."</i><br /><br />That's what the credibility revolution is all about. <br /><br /><b>See also: </b><br /><br /><a href="http://marcfbellemare.com/wordpress/10966">Do Both! </a>(by Marc Bellemare)<br /><a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html">Applied Econometrics</a><br /><a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">Econometrics, Multiple Testing, and Researcher Degrees of Freedom</a><br /><br /><br /><br /><br /><br /><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-22313281642850014762017-07-10T11:57:00.000-04:002017-07-11T13:05:38.483-04:00The Value of Graduate Education....and ExperienceSome time ago I wrote a piece titled <a href="http://econometricsense.blogspot.com/2010/09/why-study-appliedagricultural-economics.html">"Why Study Agricultural and Applied Economics."</a> While this was somewhat geared toward graduate study, degrees in these areas provide a great combination of quantitative and analytical skills at the undergraduate level suitable for a number of roles in industry, especially when combined with programming like R, SAS, or Python. (just think Nate Silver). Another example would be the number of financial analysts and risk management and modeling roles held by graduates holding bachelor's degrees in economics and finance or related fields. Not everyone needs to be a PhD holding rocket scientist to do complex analytical work in applied fields.<br /><br />However, what are some arguments for graduate study? I bring this up because sometimes I wonder, given my role in the private sector could I have had a similar trajectory if I just skipped the time, money and energy spent in graduate school and went straight to writing code?<br /><br />Perhaps. But recently I was listening to a <a href="http://www.talkingbiotechpodcast.com/088-food-evolution-the-movie/">Talking Biotech podcast with Kevin Folta discussing the movie Food Evolution.</a> Toward the end they discussed some critiques of the film, and a common critique about research in general is bias due to conflicts of interest. Kevin States:<br /><br /><i>"I've trained for 30 years to be able to understand statistics and experimental design and interpretation...I'll decide based on the quality of the data and the experimental design....that's what we do."</i><br /><br />Besides taking on the criticisms of science, this emphasized two important points.<br /><i> </i><br /><b>1)</b> <b>Graduate study teaches you to understand statistics and experimental design and interpretation. </b>At the undergraduate level I learned some basics that were quite useful in terms of empirical work. In graduate school I learned what is analogous to a new language. The additional properties of estimators, proofs, and theorems taught in graduate statistics courses suddenly made the things I learned before make better sense. This background helped me to translate and interpret other people's work and learn from it, and learn new methodologies or extend others. But it was the seminars and applied research that made it come to life. Learning to 'do science' through statistics and experimental design. And interpretation as Kevin says. <br /><br /><b>2) Graduate study is an extendable framework.</b> Learning and doing statistics is a career long process. <a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html">This recognizes the gulf between textbook and applied econometrics.</a><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-47997498235384052352017-06-11T21:17:00.001-04:002017-07-12T20:11:39.604-04:00Instrumental Variables vs. Intent to Treat<i> "ITT analysis includes every subject who is randomized according to randomized treatment assignment. It ignores noncompliance, protocol deviations, withdrawal, and anything that happens after randomization. ITT analysis is usually described as “once randomized, always analyzed”.<br /><br />"ITT analysis avoids overoptimistic estimates of the efficacy of an intervention resulting from the removal of non-compliers by accepting that noncompliance and protocol deviations are likely to occur in actual clinical practice" </i> - Gupta, 2011<br /><br /> In Mastering Metrics, Angrist and Pischke describe intent-to-treat analysis: <br /><br /><i>"In randomized trials with imperfect compliance, when treatment assignment differs from treatment delivered, effects of random assignment...are called intention-to-treat (ITT) effects. An ITT analysis captures the causal effect of being assigned to treatment."</i><br /><br />While treatment assignment is random, non-compliance is not! Therefore if instead of using intent to treat comparisons we compared those actually treated to those untreated we would get biased results, because this is essentially making uncontrolled comparisons between treated and untreated subjects. <br /><br />Angrist and Pishke describe how instrumental variables can be used in this context:<br /><br /> <i>“The instrumental variables (IV) method harnesses partial or incomplete random assignment, whether naturally occurring or generated by researchers"</i><br /><br /><i> "Instrumental variable methods allow us to capture the causal effect of treatment on the treated in spite of the nonrandom compliance decisions made by participants in experiments....Use of randomly assigned intent to treat as an instrumental variable for treatment delivered eliminates this source of selection bias."</i><br /><br />In <i>Intent-to-Treat vs. Non-Intent-to-Treat Analyses under Treatment Non-Adherence in Mental Health Randomized Trials </i>there is a nice discussion of ITT and IV methods with applications related to clinical research. Below is a nice treatment of IV in this context:<br /><br /><i>“Instrumental variables are assumed to emulate randomization variables, unrelated to unmeasured confounders influencing the outcome. In the case of randomized trials, the same randomized treatment assignment variable used in defining treatment groups in the ITT analysis is instead used as the instrumental variable in IV analyses. In particular, the instrumental variable is used to obtain for each patient a predicted probability of receiving the experimental treatment. Under the assumptions of the IV approach, these predicted probabilities of receipt of treatment are unrelated to unmeasured confounders in contrast to the vulnerability of the actually observed receipt of treatment to hidden bias. Therefore, these predicted treatment probabilities replace the observed receipt of treatment or treatment adherence in the AT model to yield an estimate of the as-received treatment effect protected against hidden bias when all of the IV assumptions hold.”</i><br /><br />A great example of IV and ITT applied to health care can be found in Finkelstein et. al. (2013 & 2014) - See t<a href="http://econometricsense.blogspot.com/2014/01/the-oregon-medicaid-experiment-applied.html">he Oregon Medicaid Experiment, Applied Econometics, and Causal Inference.</a><br /><br />Over at the <a href="http://theincidentaleconomist.com/wordpress/methods-intention-to-treat/">Incidental Economist, there was a nice discussion</a> of ITT in the context of medical research that does a good job of explaining the rationale as well as when departures from ITT make more sense (such as safety and non-inferiority trials). <br /><br /><b>See also: </b><a href="http://econometricsense.blogspot.com/2015/11/instrumental-explanations-of.html"><br /></a><a href="http://econometricsense.blogspot.com/2015/11/instrumental-explanations-of.html">Instrumental Explanations of Instrumental Variables</a><br /><br /><a href="http://econometricsense.blogspot.com/2013/06/an-toy-instrumental-variable-application.html">A Toy IV Application</a><br /><br /><a href="http://econometricsense.blogspot.com/search/label/instrumental%20variables">Other IV Related Posts </a><br /><br /><b>References: </b><br /><br />Mastering ’Metrics:<br />The Path from Cause to Effect<br />Joshua D. Angrist & Jörn-Steffen Pischke<br />2015<br /><br />Gupta, S. K. (2011). Intention-to-treat concept: A review. Perspectives in Clinical Research, 2(3), 109–112. http://doi.org/10.4103/2229-3485.83221<br /><br />Ten Have, T. R., Normand, S.-L. T., Marcus, S. M., Brown, C. H., Lavori, P., & Duan, N. (2008). Intent-to-Treat vs. Non-Intent-to-Treat Analyses under Treatment Non-Adherence in Mental Health Randomized Trials. Psychiatric Annals, 38(12), 772–783. http://doi.org/10.3928/00485713-20081201-10<br /><br /><span class="userContent">"The Oregon Experiment--Effects of Medicaid on Clinical Outcomes," by Katherine Baicker, et al. New England Journal of Medicine, 2013; 368:1713-1722. http://www.nejm.org/doi/full/10.1056/NEJMsa1212321 </span><br /><span class="userContent"><br /></span><span class="userContent">Medicaid Increases Emergency-Department Use: Evidence from Oregon's Health Insurance Experiment. Sarah L. Taubman,Heidi L. Allen, Bill J. Wright, Katherine Baicker, and Amy N. Finkelstein. Science 1246183Published online 2 January 2014 [DOI:10.1126/science.1246183] </span><br /><br /><span class="userContent">Detry MA, Lewis RJ. The Intention-to-Treat Principle<span class="subtitle">How to Assess the True Effect of Choosing a Medical Treatment</span>. <i>JAMA.</i> 2014;312(1):85-86. doi:10.1001/jama.2014.7523 </span><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-87241060599608890252017-06-06T20:36:00.001-04:002018-03-18T16:34:58.420-04:00Professional Science Master's Degree Programs in Biotechnology and ManagementAs an undergraduate I always had an interest in biotechnology and molecular genetics. However, lab work did not particularly appeal to me. I also recognized early on that science does not occur in a vacuum- its subject to social, political, economic, and financial forces. This drew me to the field of economics, specifically public choice theory.<br /><br />When it came time for graduate school I was still torn. I really wasn't interested in an MBA and despite minoring in mathematics I soon discovered that a background lacking in topology or real analysis made a PhD in Economics a long shot. However, I really liked economics. The combination of mathematically precise theories (microeconomics/game theory) and empirically sound methods (econometrics) provided a powerful framework for applied problem solving. And I still had an interest in genetics.<br /><br />I had two advisers make recommendations that got me thinking outside the box. One suggested ultimately I would find a niche that combined both economics and genetics. The other suggested I look at programs like the Bioscience Management program that was being offered at the time at George Mason University (now Bioinformatics Management). While there were not a lot of programs like that being offered at the time, the Agriculture Department at Western Kentucky University provided enough flexibility in their masters program to include courses in<span class="background-details"> biostatistics, genetics, and applied economics. I was able to work on research projects analyzing consumer perceptions of biotechnology and biotech trait resistance management using tools from econometrics, game theory, and population genetics. Additionally I took courses in applied economics and finance from both the Department of Agriculture and College of Business where I was exposed to tools related to investment analysis, options pricing, and analysis and valuation of biotech companies as well as the impacts of technological change and biotechnology on food and economic development.</span><br /><br /><span class="background-details">With this combination of quantitative training and applied work I have been able to leverage SAS, R, and Python to solve a number of challenging problems throughout a number of professional analytics and consulting roles. </span><br /><span class="background-details"><br /></span><span class="background-details">Today there are a larger number of professional science masters programs similar to the programs I contemplated over 10 years ago. </span><br /><span class="background-details"><br /></span><span class="background-details">According to <a href="https://www.professionalsciencemasters.org/about">National Professional Science Master’s Association</a>:</span><br /><span class="background-details"><br /></span><i><span class="background-details">"Professional Science Master's (PSMs) are designed for students who are seeking a graduate degree in science or mathematics and understand the need for developing workplace skills valued by top employers. A perfect fit for professionals because it allows you to pursue advanced training and excel in science or math without a Ph.D., while simultaneously developing highly-valued business skills....</span></i><i><span class="background-details">PSM programs consist of two years of coursework along with a professional component that includes business, communications and/or regulatory affairs."</span></i><br /><br /><span class="background-details">In 2012 there was an <a href="http://www.sciencemag.org/careers/2012/03/does-professional-science-masters-degree-pay">article in Science </a>detailing these degrees and some data related to salaries which seemed attractive. According to the article the first program was officially offered in 1997, reaching 140 programs by 2009 with over 247 at the time of printing.</span><br /><span class="background-details"><br /></span><span class="background-details">This commentary from the article corroborates how I feel about my experience:</span><br /><span class="background-details"><br /></span><i><span class="background-details">“There is a tendency for students to buy into the line that if you don't get a Ph.D., you're not a serious professional, that you're wasting your mind,” she says. After spending a decade talking with PSM students and graduates, she is certain that’s not true. “There is so much potential for growth and satisfaction with a PSM degree. You can become a person you didn’t even know you wanted to be.”</span></i><br /><span class="background-details"><br /></span><span class="background-details">Below are some programs that would look interesting to me that students interested in this option should check out. (t<a href="https://www.professionalsciencemasters.org/program-locator">here is a program locator you can find here)</a> . Similar to my master's, many of these programs are a mash up of biology/biotech and applied economics and business degrees. </span><br /><span class="background-details"><br /></span><span class="background-details">George Mason University- <a href="http://ssb.gmu.edu/academics/Professional-Science-Masters-in-Bioinformatics-Management.cfm">PSM Bioinformatics Management</a></span><br /><br /><span class="background-details">University of Illinois - <a href="http://psm.illinois.edu/agricultural-production">Agricultural Production </a></span><br /><span class="background-details"><br /></span><span class="background-details">Cornell- <a href="https://dyson.cornell.edu/programs/graduate/mps.html">MPS Agriculture and Life Sciences </a></span><br /><br /><span class="background-details">Washington State University - <a href="https://online.wsu.edu/grad/professionalScience.aspx">PSM Molecular Biosciences</a></span><br /><span class="background-details"><br /></span><span class="background-details">Middle Tennesee State University - <a href="http://www.mtsu.edu/programs/biotechnology-ms/">PSM Biotechnology</a></span><br /><span class="background-details"><br /></span><span class="background-details">California State - <a href="http://ext.csuci.edu/programs/ms-biotech-mba-dual-degree/index.htm">MS Biotechnology/MBA </a></span><br /><br /><span class="background-details">Johns Hopkins - <a href="http://advanced.jhu.edu/academics/dual-degree-programs/biotechnology-mba/">MBA/MS Biotechnology</a></span><br /><span class="background-details"><br /></span><span class="background-details">Rice - <a href="https://profms.rice.edu/bioscience-health-policy/overview">PSM Bioscience and Health Policy</a></span><br /><span class="background-details"><br /></span><span class="background-details">North Carolina State University - <a href="https://mba.ncsu.edu/academics/concentrations/biosciences-management/">MBA (Biosciences Mgt Concentration)</a></span><br /><span class="background-details"></span><br /><span class="background-details">Purdue/Kelley - <a href="http://agribusiness.purdue.edu/ms-mba-plan-of-study">MS-MBA</a> (not a heavy science emphasis but a very cool degree regardles from great schools)</span><br /><br /><b><span class="background-details">See also: </span></b><br /><a href="http://econometricsense.blogspot.com/2015/07/analytical-translators.html"><span class="background-details">Analytical Translators</span></a><br /><span class="background-details"><a href="http://econometricsense.blogspot.com/2010/09/why-study-appliedagricultural-economics.html">Why Study Agricultural/Applied Economics</a></span>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-52791475258853448632017-06-05T20:40:00.003-04:002017-06-05T20:45:56.919-04:00Game Theory with Python- TalkPython PodcastEpisode 104 of the TalkPython podcast discussed game theory.<br /><iframe frameborder="no" height="166" scrolling="no" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/314210830&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false" width="100%"></iframe> <br />Here are a few slices:<br /><br /><i>"Our guests this week, Vince Knight, Marc Harper, and Owen Campbell are here to discuss their Python project built to study and simulate one of the central problems in game theory, "The Prisoner's Dilemma" </i><br /><br /><i>"Yeah, so one of the things is how people end up cooperating. If we're all incentivized not to cooperate with each other yet we look around, we see all these situations where people are cooperating, so can we devise strategies that when we play this game repeatedly that coerce or convince our partners that they're better off cooperating with us than defecting against us......Okay, excellent. Give us a sense for some of the, you have some clever names for the different strategies or players, right? Strategy and player is kind of the same thing. You've got the basic ones. The cooperator and the defector, but what else?Probably the most famous one is the tit for tat strategy. Because in Axelrod's original tournament, one of the interesting results that came out with his work was that this strategy was one of the most successful."</i><br /><br />And then they get into incorporating machine learning:<br /><br /><i>"We've extended that method of taking a strategy based on some kind of machine learning algorithm, training it against the other strategies and then adding the fact of the tournaments to see about those. Right now, those are amongst the best players in the library, in terms of performance."</i><br /><br />See my <a href="http://econometricsense.blogspot.com/2017/06/game-theory-basic-introduction.html">previous post</a> for some concepts and examples from game theory that were discussed in this podcast. You can find more references from this podcast including papers, code etc. <a href="https://talkpython.fm/episodes/show/104/game-theory-in-python">here.</a><br /> Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-87906637884622997612017-06-05T20:26:00.001-04:002017-06-05T21:25:06.637-04:00Game Theory- A Basic Introduction<div class="page" title="Page 2"><div class="layoutArea"><div class="column"><span style="font-family: "times new roman"; font-size: 12.000000pt;">When someone else’s choi</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">ces impact you, it helps to have some way to anticipate their behavior. Game Theory provides the tools for doing so (Nicholson, 2002). Game Theory is a mathematical technique developed to study choice under conditions of strategic interaction (Zupan, 1998). It allows for the analysis of interdependent situations. </span><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"></span><span style="font-family: "times new roman"; font-size: 12.000000pt;"><br /></span> <span style="font-family: "times new roman"; font-size: 12.000000pt;">In game theory, a </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">game </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">is a decision-making situation with interdependent behavior between two or more individuals (Harris,1999). The individuals involved in making the decisions are the </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">players</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">. The set of possible choices made by the players are </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">strategies</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">. The outcomes of choices and strategies played are </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">payoffs</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">. Payoffs are often stated as levels of utility, income, profits, or some other stated objective particular to the game. A general assumption in game theory is that players seek the highest payoff attainable, preferring more utility to less (Nicholson, 2002). </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">When a decision maker takes into account how other players will respond to his choices, a utility maximizing strategy may be found. It may allow one to predict in advance the actions, responses, and counter responses of others and then choose optimal strategies (Harris, 1999). Such optimal strategies that leave players with no incentive to change their behavior are </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">equilibrium strategies</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">. </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">Games can be characterized by players, strategies, and payoffs. Below is one way to visualize a game. </span></div></div></div><br />Example: Overgrazing Game<br /><br /> RANCHER 2:<br /> Conserve Overgraze<br />RANCHER 1: Conserve (20, 20) | (0, 30)<br /> Overgraze (30, 0) | (10, 10)<br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">In this game, the players are rancher '1' and rancher '2'. They can play one of two strategies, to conserve or overgraze a commonly shared or 'public' pasture. Suppose rancher 1 chooses a strategy (picks a row). Their payoff is depicted by the </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">first number in each cell. Rancher 2 will choose a strategy in return (picking a column). Rancher 2’s </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">payoff is indicated by the second number in each cell. </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">In this case, the best strategy for rancher 2 (no matter what rancher 1 chooses to do) is to overgraze because the payoff for rancher 2 (the 2nd number in each cell) associated with overgrazing is always the highest. Likewise, no matter what rancher 2 chooses to do, the best strategy for rancher 1 is to overgraze because the first number in each cell (the payoffs for rancher 1) associated with overgrazing is always the highest. Both players have a dominant strategy to overgraze This represents an equilibrium strategy of {overgraze, overgraze}. </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">This outcome is also described as a prisoner’s dilemma or a </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">Nash Equilibrium. </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">In a Nash </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">equilibrium each player’s choice is the best choice possible taking into consideration the choice </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">of the other players </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">(Zupan, 1998)</span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">. </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">This concept was generalized by the mathematician John </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Nash in 1951 in his paper “Equilibrium Points in n</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">-</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Person Games.” </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">It’s easy to see that if the players would conserve</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">, they could both be made better off because the strategy {conserve, conserve} yields payoffs (20,20) which are much higher than the Nash </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Equilibrium strategy’s payoff of (10,10). </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">Just as competitive market forces elicit cooperation by coordinating behavior through price mechanisms, so too must players in a game find some means of coordinating their behavior if they wish to escape the sub-optimal Nash Equilibrium. <b> </b></span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"><b>Some Additional Concepts</b> </span><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u> </u></span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u>Multiple Period Games-</u> </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Multiple period games are games that are played more than once, or more than one time period. If we could imagine playing the pr</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">isoner’s dilemma game multiple times we would have a multi</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">- period game. If games are played perpetually they are referred to </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">infinite games</span><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">(Harris, 1999). </span><u><span style="font-family: "times new roman"; font-size: 12.000000pt;"> </span></u><br /><br /><u><span style="font-family: "times new roman"; font-size: 12.000000pt;">Punishment Schemes </span></u><b><span style="font-family: "times new roman"; font-size: 12.000000pt;">-</span></b><span style="font-family: "times new roman"; font-size: 12.000000pt;"> </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Punishment schemes are used to elicit cooperation or enforcement of agreements. </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">In the game presented above, suppose both players wanted to cooperate to conserve grazing resources. If it turned out that rancher 2 cheated, then in the next period rancher 1 would refuse to cooperate. If the game is played repeatedly, rancher 2 would learn that if he sticks to the deal both players would be better off. In this way punishment schemes in multi-period games can elicit cooperation, allowing an escape from a Nash Equilibrium. This may not be possible in the single period games that we looked at before.</span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u>Tit-for-Tat </u>- </span><span style="font-family: "times new roman"; font-size: 12.000000pt;"><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">Tit-for-tat </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">punishment mechanisms are schemes in which if one player fails to cooperate, the other player will refuse to cooperate in the next period. </span> </span><br /><br /><u><span style="font-family: "times new roman"; font-size: 12.000000pt;">Trigger Strategy</span></u><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u> </u>- In </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">infinitely </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">repeated games a trigger strategy involves a promise to play the optimal strategy as long as the other players comply (Nicholson, 2002). </span><u><span style="font-family: "times new roman"; font-size: 12.000000pt;"> </span></u><br /><br /><u><span style="font-family: "times new roman"; font-size: 12.000000pt;">Grim Trigger Strategy</span></u><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u> </u>- This is a trigger strategy that involves punishment for many periods if the other player does not cooperate. In other words if one player defects when he should cooperate, the other player(s) will not offer the chance to cooperate again for a long time. As a result both players will be confined to a N.E. for many periods or perpetually (Harris, 1999). <u> </u></span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u>Trembling Hand Trigger Strategy-</u> This is a trigger strategy that allows for mistakes. Suppose in the first instance player 1 does not </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">realize that player 2 is willing to cooperate. Instead of player 1 resorting to a long period of punishment as in the </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">grim trigger strategy</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">, player 1 allows player 2 a second chance to cooperate. It may be the case that instead of playing the </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">grim trigger strategy</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">, player 1 may invoke a single period </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">tit-for-tat </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">punishment scheme in hopes to elicit cooperation in later periods. </span><br /><br /><u><span style="font-family: "times new roman"; font-size: 12.000000pt;">Folk Theorems</span></u><span style="font-family: "times new roman"; font-size: 12.000000pt;"> - Folk theorems result from the conclusion that players can escape the outcome of a Nash Equilibrium if games are played repeatedly, or are infinite period games (Nicholson,2002).</span><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"> In general, folk theorems state that players will find it in their best interest to maintain trigger strategies in infinitely repeated games.</span><b><span style="font-family: "times new roman"; font-size: 12.000000pt;"> </span></b><br /><br /><b><span style="font-family: "times new roman"; font-size: 12.000000pt;">See also:</span></b><br />Matt Bogard. "An Econometric and Game Theoretic Analysis of Producer and Consumer Preferences Toward Agricultural Biotechnology" <i>Western Kentucky University</i> (2005) Available at: <a href="http://works.bepress.com/matt_bogard/31/">http://works.bepress.com/matt_bogard/31/</a><br /><br />Matt Bogard. "An Introduction to Game Theory: Applications in Environmental Economics and Public Choice with Mathematical Appendix" (2012) Available at: <a href="http://works.bepress.com/matt_bogard/22/">http://works.bepress.com/matt_bogard/22/ </a><br /><br />Matt Bogard. "Game Theory, A Foundation for Agricultural Economics" (2004) Available at: <a href="http://works.bepress.com/matt_bogard/32/">http://works.bepress.com/matt_bogard/32/</a><b> </b><br /><br /><b>References:</b><br /><br />Nicholson, Walter R. “Microeconomic Theory: Basic Principles and Extensions.” Southwestern Thomson Learning. U.S.A. (2002).<br /><br />Browning, Edward K. and Mark A. Zupan. “Microeconomic Theory and Applications.” 6th Edition. Addison-Wesley Longman Inc. Reading, MA. (1999)<br /><br />Harris, Frederick H. et al. “Managerial Economics: Applications, Strategy, and Tactics.” Southwestern College Publishing. Cincinnati, OH. (1999).<b><br /> </b>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-68759999528322101602017-06-03T16:04:00.000-04:002017-06-03T16:27:17.521-04:00In Praise of The Citizen Data ScientistThere was actually a really good article I read over at Data Science Central titled <a href="http://www.datasciencecentral.com/profiles/blogs/the-data-science-delusion">"The Data Science Delusion."</a> Here is an interesting slice:<br /><i><br /></i><i>"This democratization of algorithms and platforms, paradoxically, has a downside: the signaling properties of such skills have more or less been lost. Where earlier you needed to read and understand a technical paper or a book to implement a model, now you can just use an off-the-shelf model as a black-box. While this phenomenon affects many disciplines, the vague and multidisciplinary definition of data science certainly exacerbates the problem."</i><br /><br />It is true there is some loss of signal. However, companies may need to look for new signals as technological change progresses and new forms of capital complements labor.<i> </i>Its this new labor complementing role of capital (in the form of open source statistical computing packages and computing power) that is creating demand for those that can leverage these tools competently, without knowing all <i> <a href="http://econometricsense.blogspot.com/2017/04/what-do-you-really-need-to-know-to-be.html">"the nitty-gritty mathematical academic formulas to everything about support vector machines or Kernels and stuff like that to apply it properly and get results."</a></i><br /><br />Sure, as a result there are a lot of analytics programs popping up out there to take advantage of these advances, but its also the reason programs like applied economics are becoming so popular. In fact, in promoting its program, Johns Hopkins University almost seems to echo some of the sentiment in the quotes above, but takes a positive spin:<br /><br /><i>"Economic analysis is no longer relegated to academicians and a small number of PhD-trained specialists. Instead, economics has become an increasingly ubiquitous as well as rapidly changing line of inquiry that requires people who are skilled in analyzing and interpreting economic data, and then using it to effect decisions about national and global markets and policy, involving everything from health care to fiscal policy, from foreign aid to the environment, and from financial risk to real risk." </i><br /><br />In fact, I admit for a while I was a little disappointed my alma mater did not embrace the data science/analytics degree trend, or offer more courses in applied programming or incorporate languages like R into more courses. However, now, while I think these things are great I realize the more important data science skills are related to the analytical thinking and firm theoretical, statistical, and quantitative foundations that programs in economics and finance already offer at the undergraduate and masters level. While formal data science training might be the way of the future, I would venture to say that the vast majority of today's 'data scientists' were academically trained in a quantitative discipline like the above and self trained (perhaps via coursera etc.) on the skills and tools most people think of when they think of data science. As I have said before, sometimes you don't need someone with a PhD in computer science or an astrophysics. <a href="http://econometricsense.blogspot.com/2016/10/the-future-data-scientist.html">Sometimes you really just need a good MBA that understands regression and the basics of a left join.</a><br /><br />The DSC article above concludes with a little jab at data science, that I tend to agree with wholeheartedly:<br /><br /><i>"Great data science work is being done in various places by people who go by other names (analyst, software engineer, product head, or just plain old scientist). It is not necessary to be a card-carrying data scientist to do good data science work. Blasphemy it may be to say so, but only time will tell whether the label itself has value, or is only helping create a delusion." </i><br /><br /><b>See also:</b><br /><br /><a href="http://econometricsense.blogspot.com/2017/04/what-do-you-really-need-to-know-to-be.html">What you really need to know to be a data scientist</a><br /><a href="http://econometricsense.blogspot.com/2017/04/super-data-science-podcast-credit.html">Super Data Science podcast - credit scoring</a><br /><a href="http://www.kdnuggets.com/2017/03/think-like-data-scientist-become-one.html">How to think like a data scientist to become one</a><br /><a href="http://www.kdnuggets.com/2017/03/what-makes-great-data-scientist.html">What makes a great data scientist </a><br /><a href="http://econometricsense.blogspot.com/2016/10/the-future-data-scientist.html">Are data scientists going extinct</a><br /><a href="http://econometricsense.blogspot.com/2017/04/more-on-data-science-from-actual-data.html">More on data science from actual data scientists </a><br /><br /><i><br /></i><i><br /></i>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-49943560241653551612017-05-30T17:57:00.000-04:002017-05-31T12:15:31.169-04:00Multicollinearity.....just a bad joke?<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-VbgjaCABEak/WS3NHzb-UbI/AAAAAAAACUw/nOrAfc7cErgWFA-0FF3t4lamThznBr-TgCLcB/s1600/tumblr_n6i5qv3HcT1rzm4u4o1_1280.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="778" data-original-width="1200" height="207" src="https://4.bp.blogspot.com/-VbgjaCABEak/WS3NHzb-UbI/AAAAAAAACUw/nOrAfc7cErgWFA-0FF3t4lamThznBr-TgCLcB/s320/tumblr_n6i5qv3HcT1rzm4u4o1_1280.jpeg" width="320" /></a></div>Link/Credit: <a href="https://www.pinterest.com/pin/96686723226973447/">https://www.pinterest.com/pin/96686723226973447/ </a> <br /><br /><i>"The worth of an econometrics textbook tends to be inversely related to the technical material devoted to multicollinearity" </i>- Williams, R. Economic Record 68, 80-1. (1992). via Kennedy, A Guide to Econometrics (6th edition).<br /><br /><br />If you have never read Arthur S. Goldberger's treatment of multicollinearity in his well known text <i>A Course in Econometrics</i> you are missing some of the best reading in econometrics you will ever find. A few years ago Dave Giles gave a nice preview here: <a href="http://davegiles.blogspot.com/2011/09/micronumerosity.html">http://davegiles.blogspot.com/2011/09/micronumerosity.html</a><br /><br />Basically, Goldberger provides a good length discussion in his textbook about 'micronumerosity,' a term he makes up to parody multicollinearity and the excessive amount of attention it is given in textbooks and resources spent by practitioners attempting to 'detect' it (see Dave Giles post). Its more entertaining than the meme I found above.<br /><br />For a quick review, multicollinearity can be characterized in multivariable regression as a situation where there is correlation between explanatory variables. For instance if we are estimating:<br /><br /> y = b0 + b1x1 + b2x2 + b3x3 + e<br /><br />and x2 and x3 are highly correlated, the amount of independent variation in each variable is reduced. With less information available to estimate the effects b2 and b3, these estimates become less precise and their standard errors may be larger than otherwise.<br /><br />As Goldberger advises, we should not spend a lot of resources trying to apply various 'tests' for multicollinearity, but focus more on if its consequences really matter:<br /><br /><i>"Researchers should not be concerned with whether or not there really is collinearity. They may well be concerned with whether the variances of the coefficient estimates are too large-for whatever reason-to provide useful estimates of the regression coefficients" </i>(Goldberger, 1991).<br /><br />Below are some other posts I have previously written on the topic, addressing multicollinearity in the context of predictive vs inferential modeling etc.<br /><br />From my discussion of multicollinearity in <a href="http://econometricsense.blogspot.com/2015/06/linear-literalism-fundamentalist.html">Linear Literalism and Fundamentalist Econometrics</a>: <br /><br /><i>"Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model."-</i> Statist. Sci. Volume 25, Number 3 (2010), 289-310.<br /><br />See also: <br /><br /><a href="http://econometricsense.blogspot.com/2013/01/paul-allison-on-multicollinearity.html">Paul Allison on Multicollinearity - when not to worry</a><br /><br /><a href="http://econometricsense.blogspot.com/2011/01/ridge-regression.html">Ridge Regression</a><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-17935369217078563682017-04-10T05:40:00.000-04:002017-04-10T05:40:11.915-04:00More on Data Science from Actual Data ScientistsPreviously I wrote a post titled: <a href="http://econometricsense.blogspot.com/2017/04/what-do-you-really-need-to-know-to-be.html">What do you really need to know to be a data scientist. Data science lovers and haters.</a> In this post I made the general argument that this is a broad space and there is a lot of contention about the level of technical skill and tools that one must master to consider themselves a 'real' data scientist vs. getting labeled a 'fake' data scientist or 'poser' or whatever. But, to me its all about leveraging data to solve problems and most of that work is about cleaning and prepping data. It's process. In an older KDNuggets article, economist/data scientist <a href="http://www.kdnuggets.com/2012/08/exclusive-scott-nicholson-interview-economics-weather-linkedin-healthcare.html">Scott Nicholson makes a similar point:</a><br /><br /><i>GP: What advice you have for aspiring data scientists?</i><br /><i><br /></i><i>SN: Focus less on algorithms and fancy technology & more on identifying questions, and extracting/cleaning/verifying data. People often ask me how to get started, and I usually recommend that they start with a question and follow through with the end-to-end process before they think about implementing state-of-the-art technology or algorithms. Grab some data, clean it, visualize it, and run a regression or some k-means before you do anything else. That basic set of skills surprisingly is something that a lot of people are just not good at but it is crucial.</i><br /><i><br /></i><i>GP: Your opinion on the hype around Big Data - how much is real?</i><br /><i><br /></i><i>SN: Overhyped. Big data is more of a sudden realization of all of the things that we can do with the data than it is about the data themselves. Of course also it is true that there is just more data accessible for analysis and that then starts a powerful and virtuous spiral. For most companies more data is a curse as they can barely figure out what to do with what they had in 2005.</i><br /><i><br /></i>So getting your foot in the door in a data science field doesn't mean mastering Hive or Hadoop apparently. And, this does not sound like PhD level rocket science at this point either. Karolis Urbonas, Head of Business Intelligence at Amazon has recently written a couple of similarly themed pieces also at KDNuggets:<br /><br /><a href="http://www.kdnuggets.com/2017/03/think-like-data-scientist-become-one.html">How to think like a data scientist to become one</a><br /><br /><i>"I still think there’s too much chaos around the craft and much less clarity, especially for people thinking of switching careers. Don’t get me wrong – there are a lot of very complex branches of data science – like AI, robotics, computer vision, voice recognition etc. – which require very deep technical and mathematical expertise, and potentially a PhD… or two. But if you are interested in getting into a data science role that was called a business / data analyst just a few years ago – here are the four rules that have helped me get into and are still helping me survive in the data science."</i><br /><br />He emphasizes basic data analysis, statistics, and coding to get started. The emphasis again is not on specific tools, degrees etc. but more on the process and ability to use data to solve problems. Note in the comments there is some push back on the level of expertise required, but Karolis actually addressed that when he mentioned very narrow and specific roles in AI, robotics, etc. Here he's giving advice for getting started in the broad diversity of roles in data science outside these narrow tracks. The issue is some people in data science want to narrow the scope to the exclusion of much of the work done by business analysts, researchers, engineers and consultants creating much of the value in this space (<a href="http://econometricsense.blogspot.com/2017/04/what-do-you-really-need-to-know-to-be.html">again see my previous post</a>).<br /><br /><a href="http://www.kdnuggets.com/2017/03/what-makes-great-data-scientist.html">What makes a great data scientist?</a><br /><br /><i>"A data scientist is an umbrella term that describes people whose main responsibility is leveraging data to help other people (or machines) making more informed decisions….Over the years that I have worked with data and analytics I have found that this has almost nothing to do with technical skills. Yes, you read it right. Technical knowledge is a must-have if you want to get hired but that’s just the basic absolutely minimal requirement. The features that make one a great data scientist are mostly non-technical."</i><br /><b><i><br /></i></b><b><i>1. Great data scientist is obsessed with solving problems, not new tools.</i></b><br /><b><i><br /></i></b><i>"This one is so fundamental, it is hard to believe it’s so simple. Every occupation has this curse – people tend to focus on tools, processes or – more generally – emphasize the form over the content. A very good example is the on-going discussion whether R or Python is better for data science and which one will win the beauty contest. Or another one – frequentist vs. Bayesian statistics and why one will become obsolete. Or my favorite – SQL is dead, all data will be stored on NoSQL databases."</i><br /><br /><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-75189429574552343882017-04-08T12:16:00.002-04:002017-04-09T10:12:53.380-04:00What do you really need to know to be a data scientist? Data Science Lovers and Haters<a href="http://econometricsense.blogspot.com/2017/04/super-data-science-podcast-credit.html">Previously I discussed the Super Data Science podcast and credit modeling </a>in terms of the modeling strategy and models used. The discussion also covered data science in general, and one part of the conversation I thought was well worth discussing in more detail. It really gets to the question of what's it take to be a data scientist. There is a ton of energy spent on this in places like LinkedIn and other forums. I think the answer comes in two forms. From the 'lovers' of data science its all about what kind of advice can I give people to help and encourage them to create value in this space. To the 'haters' its more like now that I have established myself in this space what kind of criterion should we have to keep people out and prevent them from creating value. But before we get to that, here is some great dialogue from Kirill discussing a trap that data scientists or aspiring data scientists fall into:<br /><br />Kirill: <i>"I think there’s a level of acumen that people should have, especially going into data science role. And then if you’re a manager you might take a step back from that. You might not need that much detail…If you’re doing the algorithms, that acumen might be enough. You don’t need to know the nitty-gritty mathematical academic formulas to everything about support vector machines or Kernels and stuff like that to apply it properly and get results. On the other hand, if you find that you do need that stuff you can go and spend some additional time learning. A lot of people fall into the trap. They try to learn everything in a lot of depth, whereas I think the space of data science is so broad you can’t just learn everything in huge depths. It’s better to learn everything to an acceptable level of acumen and then deepen your knowledge in the spaces that you need."</i><br /><br />Greg: <i>"if you don’t want to get into that detail, I totally get it. You can be totally fine without it. I have never once in my career had somebody ask me what are the formulas behind the algorithm….there’s a lot of jobs out there for people that don’t know them."</i><br /><br />I admit I used to fall into this trap. In fact this blog is a direct result. Early in my career I had the mindset if you can't prove it you can't use it. I really didn't feel confident about an algorithm or method until I understood it 'on paper' and could at least code my own version in SAS IML or R. A number of posts here were based on this work and mindset. Then, a very well known and accomplished developer/computational scientist that frequently helped me gave the good advice that with this mindset you might never get any work done. Or only a fraction of work.<br /><br />Given the amount of discussion you might see on LinkedIn or the so called data science community about real or fake data scientists (lots of haters out there) in the <a href="https://talkpython.fm/episodes/show/56/data-science-from-scratch">Talk Python to Me podcast</a> author Joel Grus (of <a href="http://shop.oreilly.com/product/0636920033400.do">Data Science from Scratch</a>) provides what I think is the most honest discussion of what data science is and what data scientists do:<br /><br /><i>"there are just as many jobs called data science as there are data scientists"</i><br /><br />That is kind of paraphrasing and kind of out of context and yes very general. Almost defining a word using the word in the definition. But it is very very TRUE. That is because the field is largely undefined. To attempt to define it is futile and I think would be the antithesis of data science itself. I will warn though that there are plenty of data science haters out there that would quibble with what Greg and Joel have said above.<br /><br />These are people that want to impose something more strict. Some minimum threshold. Common threads indicate some fear of a poser or fake data scientist fooling some company into hiring them or incompetently pointing and clicking their way through an analysis without knowing what is going on and calling themselves a data scientist. While I understand that concern, its one extreme. It can easily morph into a straw man argument for a more political agenda at the other extreme. That might lead to a listing of minimal requirements to be a <i>real </i>data scientist, some laundry list of requirements (think big data technologies, degrees and the like). Economists know all about this and we see it in the form of licensing and rent seeking in a number of professions and industries. Broadly speaking its a waste of resources. Absolutely in this broad space economists would also recognize merit in signaling through certification, certain degree programs or course work, or other methods of credentialization. But there is a big difference between competitive signaling and non-competitive rent seeking behaviors.<br /><br />In its inception, data science was all about disruption. As described in Johns Hopkins applied economics program description:<br /><br /><i>“Economic analysis is no longer relegated to academicians and a small number of PhD-trained specialists. Instead, economics has become an increasingly ubiquitous as well as rapidly changing line of inquiry that requires people who are skilled in analyzing and interpreting economic data, and then using it to effect decisions ………Advances in computing and the greater availability of timely data through theInternet have created an arena which demands skilled statistical analysis, guided by economic reasoning and modeling.”</i><br /><br />This parallels data science. Suddenly you no longer need a PhD in statistics or a software engineering background or an academics' level of acumen to create value added analysis. (although those are all excellent backgrounds for doing some advanced work in data science no doubt). Its that <a href="http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram">basic combination of subject matter expertise, some knowledge of statistics and machine learning, and ability to write code or use software to solve problems.</a> That's it. Its disruptive and the haters hate it. They simultaneously embrace the disruption and want to reign it in and fence out the competition. I hate it for the haters but you don't need to be able to code your own estimators or train a neural net from scratch to use it. And there is probably as much or more value creating professional space out there for someone that can clean a data set and provide a set of cross tabs as there is for the know how to set up a Hadoop cluster.<br /><br />Below are a couple of really great KDNuggets articles in this regard written by Karolis Urbonas, Head of Business Intelligence at Amazon:<br /><br /><a href="http://www.kdnuggets.com/2017/03/think-like-data-scientist-become-one.html">How to think like a data scientist to become one</a><br /><br /><a href="http://www.kdnuggets.com/2017/03/what-makes-great-data-scientist.html">What makes a great data scientist?</a><br /><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-24278960762776002862017-04-08T09:16:00.000-04:002017-04-08T18:09:26.938-04:00Super Data Science Podcast Credit Scoring ModelsI recently discovered the Super Data Science podcast hosted by Kirill Eremenko. What I like about this podcast series is that it is applied data science. You can talk all day about theory, theorems, proofs, and mathematical details and assumptions. Even if you could master every technical detail underlying 'data science' you have only scratched the surface. What distinguishes data science from the academic discipline of statistics, computer science, or machine learning is application to solve a problem for business or society. Its not theory for theory's sake. There are huge gaps between theory and application that can easily stump a team of PhD's or experienced practitioners (see also <a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html">applied econometrics</a>). Podcasts like this can help bridge the gap.<br /><br /><a href="https://www.superdatascience.com/sds-014-credit-scoring-models-the-law-of-large-numbers-and-model-building-with-greg-poppe/">Episode 014</a> featured Greg Poppe who is Sr Vice President for risk management at an auto lending firm. They discussed how data science is leveraged in loan approvals and rate setting among other things.<br /><br />The general modeling approach that Greg discussed is very similar to work that I have done before in student risk modeling in higher education (see <a href="http://econometricsense.blogspot.com/2013/04/using-advanced-analytics-to-recruit.html">here</a> and <a href="http://econometricsense.blogspot.com/2013/12/sas-global-forum-papers.html">here</a>).<br /><br /><i>"So think of it like -- you know, I would have a hard time telling you with any high degree of certainty, “This loan will pay. This loan will pay. But this loan won’t.” However, if you give me a portfolio of a hundred loans, I should be able to say “15 aren’t going to pay. I don’t know which 15, but 15 won’t.” And then if you give me another portfolio that’s say riskier, I should be able to measure that risk and say “This is a riskier pool. 25 aren’t going to pay. And again, I don’t know which 25, but I’m estimating 25.” And that’s how we measure our accuracy. So it’s not so much on a loan-by-loan basis. It’s “If we just select a random sample, how many did not pay, and what was our expectation of that?” And if they’re very close, we consider our models to be accurate."</i><br /><br />A toy example in R that seems very similar can be found here (<a href="http://econometricsense.blogspot.com/2011/03/predictive-modeling-and-custom.html">Predictive Modeling and Custom Reporting in R</a>).<br /><br />So at a basic level they are just using predictive models to get a score and using cutoffs to determine different pools of risk and making approvals, declines, and setting interest rates based on this. He doesn't discuss the specifics of the model testing, but to me the key here sounds a lot like calibration (see <a href="http://econometricsense.blogspot.com/2013/04/is-roc-curve-good-metric-for-model.html">Is the ROC curve a good metric for model calibration?</a>). In terms of the types of models they use of this it gets very interesting. As Kirill says, the whole podcast is worth listening to for this very point. For their credit scoring models they use regression, even though they could get improved performance from other algorithms like decision trees or ensembles. Why?<br /><br /><i>"so primarily in the credit decisioning models, we use regression models. And the reason why—well, there’s quite a few. One is it’s very computationally easy. It’s easy to explain, it’s easy for people to understand but it’s also not a black box in the sense that a lot of models can be, and what we need to do is we need to provide a continuity to a dealership because they can adjust the parameters of the application and that will adjust the risk accordingly…..If we were to go with a CART model or any other decision tree model, if the first break point or the first cut point in that model is down payment and they go from one side to the other, it can throw it down a completely separate set of decision logic and they can get very strange approvals. From a data science perspective and from an analytics perspective, that may be more accurate but it’s not sellable, it’s not marketable to the dealership."</i><br /><br />Yes huge gap just filled and well worth repeating. Its interesting, in a different scenario you could go the other way around. For instance, in my work in higher education student risk modeling we went with decision trees instead of regression but based on a similar line of reasoning. Our end users however were not going to be tweaking parameters but getting sign off and buy in required that they understand more about what the model was doing. The explicit nature of the splits and decision logic of the trees was easier to explain and understand for untrained statisticians than was regression models or neural networks.<br /><br />If you have been a practitioner for a while you might think of course every data scientist knows there is a tradeoff between accuracy, complexity, and functional practicality. I agree but it still can't be emphasized enough. And more time should be spent on applied examples like this vs the waste we see in social media discussion who is or isn't a fake data scientist. The real data scientists are too busy working in the gaps between theory and practice to care. To be continued....<br /><br /><br /><br /><br /><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-58957574928784236902017-04-07T18:08:00.000-04:002017-04-07T22:18:16.971-04:00Andrew Gelman on EconTalkRecently <a href="http://www.stat.columbia.edu/~gelman/">Andrew Gelman</a> was on <a href="http://www.econtalk.org/archives/2017/03/andrew_gelman_o.html">EconTalk with Russ Roberts</a>. A couple of the most interesting topics covered included discussion of his <a href="http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf">garden of forking paths</a> as well as what Gelman covered in a fairly recent blog post- The “<a href="http://andrewgelman.com/2017/02/06/not-kill-statistical-significance-makes-stronger-fallacy/">What does not kill my statistical significance makes it stronger” fallacy.</a><br /><br />Here is an excerpt from the transcript:<br /><br />Russ Roberts: <i>But you have a small sample...--that's very noisy, usually. Very imprecise. And you still found statistical significance. That means, 'Wow, if you'd had a big sample you'd have found even a more reliable effect.'</i><br /><br />Andrew Gelman: <i>Um, yes. You're using what we call the 'That which does not kill my statistical significance makes it stronger' fallacy. We can talk about that, too.</i><br /><br />From Andrew's Blog Post:<br /><br /><i>"The idea is that statistical significance is taken as an even stronger signal when it was obtained from a noisy study.</i><br /><i><br /></i><i>This idea, while attractive, is wrong. Eric Loken and I call it the “What does not kill my statistical significance makes it stronger” fallacy.</i><br /><br /><i>"What went wrong? Why it is a fallacy? In short, conditional on statistical significance at some specified level, the noisier the estimate, the higher the Type M and Type S errors. Type M (magnitude) error says that a statistically significant estimate will overestimate the magnitude of the underlying effect, and Type S error says that a statistically significant estimate can have a high probability of getting the sign wrong.</i><br /><i><br /></i><i>We demonstrated this with an extreme case a couple years ago in a post entitled, “This is what “power = .06” looks like. Get used to it.” We were talking about a really noisy study where, if a statistically significant difference is found, it is guaranteed to be at least 9 times higher than any true effect, with a 24% chance of getting the sign backward."</i><br /><i><br /></i>Noted in both the podcast and the blog post by Gelman, this is not a well known fallacy and as they point out very well known researchers appear to be found committing it one time or another in their writing or dialogue.<br /><br />See also: <a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">Econometrics, Multiple Testing, and Researcher Degrees of Freedom</a><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-50582672859561598662017-03-22T02:53:00.000-04:002017-03-22T11:06:20.596-04:00Count Models with Offsets: Practical Applications Using R<div class="MsoNormal">See also:</div><div class="MsoNormal"><a href="http://econometricsense.blogspot.com/2017/03/count-model-regressions.html">Basic Econometrics of Counts</a></div><div class="MsoNormal"><a href="http://econometricsense.blogspot.com/2017/03/count-models-with-offsets.html">Count Models with Offsets </a></div><div class="MsoNormal"><br /></div><div class="MsoNormal">Lets consider three count modeling scenarios and determine the appropriate modeling strategy. </div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Example 1:</b> Suppose we are observing kids playing basketball during open gym. We have two groups of equal size A and B. Suppose both groups play for 60 minutes and kids in one group, A, score on average about 2.5 goals each while group B averages 5.</div><div class="MsoNormal">In this case both groups of students engage in activity for the same amount of time. There seems to be no need to include time as an offset. And it is clear, for whatever reason students in group B are better at scoring and therefore score more goals.</div><div class="MsoNormal"><br /></div><div class="MsoNormal">If we simulate count data to mimic this scenario (see toy data below) we might get descriptive statistics that look like this:</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Table 1:</b></div><div class="MsoNormal"><img border="0" class="CToWUd" height="121" id="m_-1578900333233357994m_6784948537350094458Picture_x0020_4" src="https://mail.google.com/mail/u/0/?ui=2&ik=c5987f989f&view=fimg&th=15af66aff2c4d905&attid=0.0.3&disp=emb&realattid=13eb100b6ae42902_0.0.1&attbid=ANGjdJ_g9IUbmEYrgvrxrJM6NtqrRvW6KzdHDH_VJ6Y4PFrbUzV9aYpod9GbAaIYkyc2IN-mxDf5XHrFWMeRcqA65VxbRsKH0gRvHpTA6rAuir143VyUjOGco40Mtlo&sz=w802-h242&ats=1490193032575&rm=15af66aff2c4d905&zw&atsh=1" width="401" /></div><div class="MsoNormal"><br /></div><div class="MsoNormal">It is clear for the period of observation (60 minutes) group B out scored A. Would we in practice discuss this in terms of rates? Total points per 60 minute session? Or total goals per minute? In this case group A scores .0433 goals per minute vs. .09 for B. Again, we conclude based on rates that B is better at scoring goals. But most likely, despite the implicit or explicit view of rate, we would discuss these outcomes in a more practical sense, total goals for A vs B. </div><div class="MsoNormal"><br /></div><div class="MsoNormal">We could model this difference with a Poisson regression:</div><div class="MsoNormal"><br /></div><span style="font-family: "courier new";">summary(glm(COUNT ~ GROUP,data = counts, family = poisson))</span><br /><br /><br /><div class="MsoNormal">Table 2:</div><div style="border: solid windowtext 1.0pt; padding: 1.0pt 4.0pt 1.0pt 4.0pt;"><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">Coefficients:</span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";"> Estimate Std. Error z value Pr(>|z|) </span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">(Intercept) 0.9555 0.1601 5.967 2.41e-09 ***</span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">GROUPB 0.7309 0.1949 3.750 0.000177 ***</span></div></div><div class="MsoNormal"><br /></div><div class="MsoNormal">We can from this that group B completes significantly more goals than A, at a ‘rate’ exp(.7309) = 2.075 times that of A. (roughly twice as many goals). This is basically what we get from a direct comparison of the average counts in the descriptives above.</div><div class="MsoNormal">But what if we wanted to be explicit about the interval of observation and include an offset? The way we incorporate rates into a poisson model for counts is through the offset. </div><div class="MsoNormal"><br /></div><div class="MsoNormal">Log(μ/t<sub>x</sub>) = xβ here we are explicitly specifying a rate based on time ‘t<sub>x</sub>’</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Re-arranging terms we get:</div><div class="MsoNormal">Log(μ) – Log(t<sub>x</sub>) = xβ </div><div class="MsoNormal">Log(μ) = xβ + Log(t<sub>x</sub>)</div><div class="MsoNormal">The term Log(t<sub>x</sub>) becomes our ‘offset.’</div><div class="MsoNormal"><br /></div><div class="MsoNormal">So we would do this by including log(time) as an offset in our R code: </div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "courier new";">summary(glm(COUNT ~ GROUP + offset(log(TIME2)),data = counts, family = poisson))</span></div><div class="MsoNormal"><br /></div><div class="MsoNormal">It turns out the estimate of .7309 for B vs A is the same. Whether we directly compare the raw counts, or run count models with or without offsets we get the same result. </div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Table 3: </b></div><div style="border: solid windowtext 1.0pt; padding: 1.0pt 4.0pt 1.0pt 4.0pt;"><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">Coefficients:</span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";"> Estimate Std. Error z value Pr(>|z|) </span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">(Intercept) -3.1388 0.1601 -19.60 < 2e-16 ***</span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">GROUPB 0.7309 0.1949 3.75 0.000177 ***</span></div></div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Example 2: </b>Suppose again we are observing kids playing basketball during open gym. Let’s still refer to them as groups A and B. Suppose after 30 minutes group A is forced to leave the court (maybe their section of the court is reserved for an art show). Before leaving they score an average of about 2.5 goals. Group B is allowed to play for 60 minutes scoring an average of about 5 goals. This is an example where the two groups had different observation times or exposure times (i.e. playing time). Its plausible that if Group A continued to play longer they would have had more risk or opportunity to score more goals. It seems the only fair way to compare goal scoring for A vs B is to consider court time, or exposure or the rate of goal completion. If we use the same toy data as before (but assuming this different scenario) we would get the following descriptives:</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Table 4</b></div><div class="MsoNormal"><img border="0" class="CToWUd" height="121" id="m_-1578900333233357994m_6784948537350094458_x0000_i1027" src="https://mail.google.com/mail/u/0/?ui=2&ik=c5987f989f&view=fimg&th=15af66aff2c4d905&attid=0.0.4&disp=emb&realattid=13eb100b6ae42902_0.0.2&attbid=ANGjdJ8J53MDIuyMBhD_vQ5a6w-B-Ew6bNzxm62YTqgaHMh0TmrnH7XszKlykeJgiEFAM4Nie8kujZU6eVMgwfe_7KqRQwkyfGdKqwNeTtYHT8gbm8sBP_BN94Wsn_s&sz=w962-h242&ats=1490193032579&rm=15af66aff2c4d905&zw&atsh=1" width="481" /></div><div class="MsoNormal"><br /></div><div class="MsoNormal">You can see that the difference in the rate of goals scored is very small. Both teams are put on an ‘even’ playing field when we consider rates of goal completion. </div><div class="MsoNormal"><br /></div><div class="MsoNormal">If we fail to consider exposure or the period of observation we run the following model:</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "courier new";">summary(glm(COUNT ~ GROUP,data = counts, family = poisson))</span></div><div class="MsoNormal"><br /></div><div class="MsoNormal">The results will appear the same as in table 2 above. But what if we want to consider the differences in exposure or observation time? In this case we would include an offset in our model specification:</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "courier new";">summary(glm(COUNT ~ GROUP + offset(log(TIME3)),data = counts, family = poisson))</span></div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Table 5</b></div><div style="border: solid windowtext 1.0pt; padding: 1.0pt 4.0pt 1.0pt 4.0pt;"><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">Coefficients:</span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";"> Estimate Std. Error z value Pr(>|z|) </span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">(Intercept) -2.44569 0.16013 -15.273 <2e-16 ***</span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">GROUPB 0.03774 0.19490 0.194 0.846 </span></div></div><div class="MsoNormal"><br /></div><div class="MsoNormal">We can see from the results that when considering exposure (modeling with an offset) there is no significant difference between groups, although this could be an issue of low power and small sample size. Directionally group B completes about 3.8% more goals (per minute of exposure) than A or alternatively exp(.0377) = 1.038 indicates that B completes 1.038 times as many goals as A or alternatively (1.038-1)*100 = 3.8% more. We can get all of this from the descriptives by comparing the average ‘rates’ of goal completion for B vs A. But the conclusion is all the same, and if we fail to consider rates or exposure in this case we get the wrong answer!!!</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Example 3:</b> Suppose we are again observing kids playing basketball during open gym with groups A and B. Except this time group A tires out after playing about 20 minutes or so and leaves the court after scoring 2.6 goals each on average. Group B perseveres another 30 minutes or so and scores a total of about 5 goals on average per student. In this instance there seem to be important differences in group A and B in terms of drive and ambition that should not be equated by accounting for time played or inclusion of an offset. Event success seems to drive time as much as time drives the event. In this instance if we want to think of a ‘rate’ the rate is total goals scored per open gym session, not per minute of activity. The relevant interval is a single open gym session.</div><div class="MsoNormal">In this case time <a data-saferedirecturl="https://www.google.com/url?hl=en&q=https://books.google.com/books?id%3DPwbtCAAAQBAJ%26pg%3DPA70%26lpg%3DPA70%26dq%3Dcount%2Bmodels%2Bwith%2Bendogenous%2Bexposure%26source%3Dbl%26ots%3DLRv_0P0lEA%26sig%3DAY8qiXUMbwPfAiwGV8_w9u63DPM%26hl%3Den%26sa%3DX%26ved%3D0ahUKEwiYx4P9gunSAhWEbiYKHdx0D_8Q6AEILjAC%23v%3Donepage%26q%3Dcount%2520models%2520with%2520endogenous%2520exposure%26f%3Dfalse&source=gmail&ust=1490279432592000&usg=AFQjCNHoKTK0WrwN9FJtSTlC9Nod2o-jLw" href="https://books.google.com/books?id=PwbtCAAAQBAJ&pg=PA70&lpg=PA70&dq=count+models+with+endogenous+exposure&source=bl&ots=LRv_0P0lEA&sig=AY8qiXUMbwPfAiwGV8_w9u63DPM&hl=en&sa=X&ved=0ahUKEwiYx4P9gunSAhWEbiYKHdx0D_8Q6AEILjAC#v=onepage&q=count%20models%20with%20endogenous%20exposure&f=false" target="_blank">actually seems endogenous</a> or confounded with the outcome or confounded with other factors like effort and motivation which drive the outcome.</div><div class="MsoNormal"><br /></div><div class="MsoNormal">If we alter our simulated data from before to mimic this scenario we would generate the following descriptive statistics:</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Table 6:</b></div><div class="MsoNormal"><img border="0" class="CToWUd a6T" height="169" id="m_-1578900333233357994m_6784948537350094458Picture_x0020_2" src="https://mail.google.com/mail/u/0/?ui=2&ik=c5987f989f&view=fimg&th=15af66aff2c4d905&attid=0.0.1&disp=emb&realattid=13eb100b6ae42902_0.0.3&attbid=ANGjdJ9VPCNVM9GhikiczD7S68klDn5qmVVISZ2HImktNATvCl9iJ3X-l9CiyDEAJv582fs0QI97X_S7YXCPWtMpkufbtUkT38gcunkYkkPXDHn8B8377fd8l2CLSQ8&sz=w802-h338&ats=1490193032581&rm=15af66aff2c4d905&zw&atsh=1" tabindex="0" width="401" /></div><div class="MsoNormal"><br /></div><div class="MsoNormal">As discussed previously, this should be modeled without an offset, implying equal exposure/observation time with regard to the event or exposure being an entire open gym session. We can think of this as a model of counts, or an implied model of rates in terms of total goals per open gym session. In that case we get the same results as table 2 indicating that group B scores more goals than A. It makes no sense in this case to include time as an offset or compare rates of goal completion between groups. But, if we did model this with an offset (making this a model with an explicit specification of exposure being court time) then we would get the following:</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Table7 :</b></div><div style="border: solid windowtext 1.0pt; padding: 1.0pt 4.0pt 1.0pt 4.0pt;"><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";"> Estimate Std. Error z value Pr(>|z|) </span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">(Intercept) -2.1203 0.1601 -13.241 <2e-16 ***</span></div><div class="MsoNormal" style="border: none; padding: 0in;"><span style="font-family: "courier new";">GROUPB -0.1054 0.1949 -0.541 0.589 </span></div></div><div class="MsoNormal"><br /></div><div class="MsoNormal">In this case we find that modeling this explicitly using playing time as exposure we get a result indicating that group B completes fewer goals or completes goals at a rate lower than group A. This approach completely ignores the fact that group A had persevered to play longer and ultimately complete more goals. Including an offset in this case most likely leads to the wrong conclusion. </div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Summary:</b> When modeling outcomes that are counts a rate is always implied by the nature of the probability mass function for a Poisson process. However, in practical applications we may not always think of our outcome as an explicit rate based on an explicit interval or exposure time. In some cases this distinction can be critical. When we want to explicitly consider differences in exposure this is done through specification of an offset in our count model. Three examples were given using toy data where (1) modeling rates or including an offset made no difference in outcome (2) including an offset was required to obtain the correct conclusion and (3) including an offset may lead to the wrong conclusion. </div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Conclusion:</b> Counts always occur within some interval of time or space and therefore can always have an implicit ‘rate’ interpretation. If counts are observed across different intervals in time or space for different observations then differences in outcomes should be modeled through the specification of an offset. Whether to include an offset really depends on answering the questions: (1) What is the relevant interval in time or space upon which our counts are based? (2) Is this interval different across our observations of counts?</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>References:</b></div><div class="MsoNormal"><br /></div><div class="MsoNormal">Essentials of Count Data Regression. A. Colin Cameron and Pravin K. Trivedi. (1999)<br /><br />Count Data Models for Financial Data. A. Colin Cameron and Pravin K. Trivedi. (1996)</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Models for Count Outcomes. Richard Williams, University of Notre Dame, <a data-saferedirecturl="https://www.google.com/url?hl=en&q=http://www3.nd.edu/~rwilliam/&source=gmail&ust=1490279432592000&usg=AFQjCNHQdNQfyh7e0ZLr4JZ2DnNRfmQtVg" href="http://www3.nd.edu/%7Erwilliam/" target="_blank">http://www3.nd.edu/~rwilliam/</a> . Last revised February 16, 2016</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Econometric Analysis of Count Data. By Rainer Winkelmann. 2nd Edition.</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><u>Notes:</u> This ignores any discussion related to overdispersion or inflated zeros which relate to other possible model specifications including negative binomial or zero-inflated poisson (ZIP) or zero-inflated negative binomial (ZINB) models.</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Simulated Toy Count Data:</b></div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "courier new";">COUNT GROUP ID TIME TIME2 TIME3</span></div><div class="MsoNormal"><span style="font-family: "courier new";">3 A 1 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">4 A 2 25 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">2 A 3 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">2 A 4 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">1 A 5 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">6 A 6 30 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">0 A 7 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">0 A 8 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">1 A 9 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">5 A 10 25 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">3 A 11 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">2 A 12 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">3 A 13 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">3 A 14 25 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">4 A 15 20 60 30</span></div><div class="MsoNormal"><span style="font-family: "courier new";">5 B 16 50 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">4 B 17 45 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">7 B 18 55 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">8 B 19 50 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">3 B 20 50 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">7 B 21 45 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">5 B 22 55 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">4 B 23 50 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">7 B 24 50 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">8 B 25 45 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">5 B 26 55 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">3 B 27 50 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">5 B 28 50 60 60</span></div><div class="MsoNormal"><span style="font-family: "courier new";">4 B 29 45 60 60</span></div><span style="font-family: "courier new";">6 B 30 55 60 60</span>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-34901313823283521762017-03-11T17:27:00.001-05:002017-03-11T23:33:41.024-05:00Basic Econometrics of CountsAs Cameron and Trivedi state, Poisson regression is <i>"the starting point for count data analysis, though it is often inadequate"</i> (Cameron and Trivedi,1999). The<i> "main focus is the effect of covariates on the frequency of an event, measured by non-negative integer values or counts"</i>(Cameron and Trivedi,1996).<br /><br />Examples of counts they reference are related to medical care utilization such as office visits or days in the hospital.<br /><br /><i>"In all cases the data are concentrated on a few small discrete values, say 0, 1 and 2; skewed to the left; and intrinsically heteroskedastic with variance increasing with the mean. These features motivate the application of special methods and models for count regression." </i>(Cameron and Trivedi, 1999).<br /><br />From Cameron and Trivedi (1999) <i>"The Poisson regression model is derived from the Poisson distribution by parameterizing the relation between the mean parameter μ and covariates (regressors) x. The standard assumption is to use the exponential mean parameterization"</i><br /><br />μ = exp(xβ)<br /><br />Slightly abusing partial derivative notation we derive the marginal effect of x on y as follows:<br /><br />dE[y|x]/dx = β*exp(xβ)<br /><br />What this implies is that if β = .10 and exp(xβ) = 2 then a 1 unit change in x will change the expectation of y by .20 units. <br /><br />Another way of interpretation is to get an approximate value for average response by:<br /><br /> mean(y)*β<br /><br />(see also Wooldrige, 2010)<br /><br />Another way this is interpreted is through exponentiation. From <a href="https://onlinecourses.science.psu.edu/stat504/node/168">Penn State's STATS 504 course</a>:<br /><br /><i>"with every unit increase in X, the predictor variable has multiplicative effect of exp(β) on the mean (i.e. expected count) of Y."</i><br /><br />This implies (as noted in the course notes):<br /><br /> If β = 0, then exp(β) = 1 and Y and X are not related.<br /> If β > 0, then exp(β) > 1, and the expected count μ = E(y) is exp(β) times larger than when X = 0<br /> If β < 0, then exp(β) < 1, and the expected count μ = E(y) is exp(β) times smaller than when X = 0<br /><br />For example, if comparing a group A i.e. (X = 1) vs B i.e. (X = 0) if exp(β) = .5, group A has an expected count .50 times smaller than B. On the other had if exp(β) = 1.5, group A has an expected count 1.5 times larger than B.<br /><br />This could also be interpreted in percentage terms (<a href="http://econometricsense.blogspot.com/2011/03/calculation-and-interpretation-of-odds.html">similar to odds ratios in logistic regression</a>)<br /><br />For example, comparing a group A (X = 1) vs B (X = 0) if exp(B) = .5 that implies that group A has (.5-1)*100% = -50% lower expected count than group B. On the other hand if exp(B) = 1.5, this implies that group A has a (1.5-1)*100% = 50% larger expected count that B.<br /><br />A simple rule of thumb or shortcut is for small values we can interpret β as a percent change in the expected count of y for a given change in x, as in 100*β (Wooldrigde, 2nd ed, 2010)<br /><br /> <style> <!-- BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Arial"; font-size:x-small } --> </style> <br /><table border="0" cellspacing="0" cols="3" frame="VOID" rules="NONE"> <colgroup><col width="58"></col><col width="101"></col><col width="101"></col></colgroup> <tbody><tr> <td align="LEFT" height="18" width="58"> β</td> <td align="LEFT" width="101"> exp(β)</td> <td align="LEFT" width="101"> (β-1)*100%</td> </tr><tr> <td align="RIGHT" height="18">0.01</td> <td align="RIGHT">1.0100501671</td> <td align="RIGHT">1.0050167084</td> </tr><tr> <td align="RIGHT" height="18">0.02</td> <td align="RIGHT">1.02020134</td> <td align="RIGHT">2.0201340027</td> </tr><tr> <td align="RIGHT" height="18">0.03</td> <td align="RIGHT">1.030454534</td> <td align="RIGHT">3.0454533954</td> </tr><tr> <td align="RIGHT" height="18">0.04</td> <td align="RIGHT">1.0408107742</td> <td align="RIGHT">4.0810774192</td> </tr><tr> <td align="RIGHT" height="18">0.05</td> <td align="RIGHT">1.0512710964</td> <td align="RIGHT">5.1271096376</td> </tr><tr> <td align="RIGHT" height="18">0.06</td> <td align="RIGHT">1.0618365465</td> <td align="RIGHT">6.1836546545</td> </tr><tr> <td align="RIGHT" height="18">0.07</td> <td align="RIGHT">1.0725081813</td> <td align="RIGHT">7.2508181254</td> </tr><tr> <td align="RIGHT" height="18">0.08</td> <td align="RIGHT">1.0832870677</td> <td align="RIGHT">8.3287067675</td> </tr><tr> <td align="RIGHT" height="18">0.09</td> <td align="RIGHT">1.0941742837</td> <td align="RIGHT">9.4174283705</td> </tr><tr> <td align="RIGHT" height="18">0.1</td> <td align="RIGHT">1.1051709181</td> <td align="RIGHT">10.5170918076</td> </tr><tr> <td align="RIGHT" height="18">0.11</td> <td align="RIGHT">1.1162780705</td> <td align="RIGHT">11.6278070459</td> </tr></tbody></table><br />In STATA, the margins command can be used to get predicted (average) counts at each specified level of a covariate. This is similar as I understand, to getting <a href="http://econometricsense.blogspot.com/2016/03/marginal-effects-vs-odds-ratios_11.html">marginal effects at the mean for logistic regression</a>. See <a href="http://stats.idre.ucla.edu/stata/dae/poisson-regression/">UCLA STATA examples.</a> Similarly this can be done in SAS using the <a href="http://stats.idre.ucla.edu/sas/dae/poisson-regression/">ilink option with lsmeans. </a><br /><br /><b>An Applied Example:</b> Suppose we have some count outcome y that we want to model as a function of some treatment 'TRT.' Maybe we are modeling hospital admission rate differences by treated vs control group for some intervention or maybe this is number of weeds in an acre plot for a treated vs control group in an agricultural experiment. <a href="https://gist.github.com/AgEconomist/098d398bf6d2001a549d2a64493d116f">Using python</a> I simulated a toy count data set for two groups treated (TRT = 1) and untreated (TRT = 2). Descriptive statistics indicate a treatment effect.<br /><br />Mean (treated): 2.6<br />Mean (untreated): 5.4<br /><br />However, if I want to look at the significance of this I can model the treatment effect by specifying a Poisson regression model. Results are below:<br /><br />E[y|TRT] = exp(xβ) where x = TRT our binary indicator for treatment<br /><br /><br /><img alt="" src="" /><br /><br />Despite the poor quality of this image we can see that our estimate for β = -.7309. This is rather large so the direct percentage approximation above won't likely hold. However we we can interpret the significance and direction of the effect to imply that the treatment significantly reduces the expected count of y, our outcome of interest. The chart presented previously indicates that as β becomes large the direct percentage shortcut interpretation tends to overestimate the true effect. This implies that the treatment is reducing the expected count on some order less than 73%. If we take the path of exponentiation we get:<br /><br />exp(-.7309) = .48<br /><br />This implies the treatment group has an expected count .48 times lower than the control. In percentage terms the treatment group has an expected count (.48-1)*100 = -52% or 52% lower than the control group.<br /><br />Interestingly, with a single variable poisson regression model we can derive these results from the descriptive data.<br /><br />If we take the ratio of average counts for treated vs untreated groups we get 2.6/5.4= .48 which is basically the same as our exponentiated result exp(β). And if we calculate a difference in raw means between treated and untreated groups we see that in fact the treatment group has an average count that is about 52% lower than the control group.<b> </b><br /><br /><b>Extensions of the Model </b><br /><br />As stated at the beginning of this post, the poisson model is just the benchmark or starting point for count models. One assumption is that the mean and variance are equal. This is known as 'equidispersion.' If the variance exceeds the mean that is referred to as overdispersion and negative binomial models are often specified (but interpretation of the coefficients is unchanged). Overdispersion is more often the case (Cameron and Trivedi, 1996). Other special cases consider the proportion of zeros, sometimes accounted for by zero inflated poisson (ZIP) or zero inflated negative binomial models (ZINB). As noted in Cameron and Trivedi (1996) count models and duration models can be viewed as duals. If observed units have different levels of exposure or duration this is accounted for in count models through inclusion of an offset. More advanced treatment and references should be considered in these cases. <br /><br /><b>Applied Examples from Literature</b><br /><br />Some examples where counts are modeled in the applied economics literature include the following:<br /><br /><i>The Demand for Varied Diet with Econometric Models for Count Data. Jonq-Ying Lee. American Journal of Agricultural Economics, vol 69 no 3 (Aug,1987)</i><br /><i><br /></i><i>Standing on the shoulders of giants: Coherence and biotechnology innovation performance. Sanchez and Ng. Selected Poster 2015 Agricultural and Applied Economics Association and Western Agricultural Economics Association Join Annual Meeting. San Francisco CA July 26-28</i><br /><i><br /></i><i>Adoption of Best Management Practices to Control Weed Resistance by Corn, Cotton,and Soybean Growers. Frisvold, Hurley, and Mitchell. AgBioForum 12(3&4) 370-381. 2009.</i><br /><br />In all cases, as is common in the limited amount of literature I have seen in applied economics, the results of the count regressions are interpreted in terms of direction and significance, but not much consideration is given to an interpretation of results based on exponentiation of coefficients. <b> </b><br /><br /><b>References:</b><br /><br />Essentials of Count Data Regression. A. Colin Cameron and Pravin K. Trivedi. (1999)<br />Count Data Models for Financial Data. A. Colin Cameron and Pravin K. Trivedi. (1996)<br />Econometric Analysis of Cross Section and Panel Data. Wooldridge. 2nd Ed. 2010.<br /><br /><b>See also:</b><br /><a href="http://econometricsense.blogspot.com/2017/03/count-models-with-offsets.html">Count Models with Offsets</a><br /><a href="http://econometricsense.blogspot.com/2015/12/do-we-really-need-zero-inflated-models.html">Do we Really Need Zero Inflated Models</a><br /><a href="http://econometricsense.blogspot.com/2014/03/quantile-regression-with-count-data.html">Quantile Regression with Count Data</a><br /><br />For python code related to the applied example see the following <a href="https://gist.github.com/AgEconomist/098d398bf6d2001a549d2a64493d116f">gist.</a>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-55012488232399500442017-03-11T15:01:00.002-05:002017-03-22T11:06:56.871-04:00Count Models with Offsets<div class="MsoNormal">See also: <a href="http://econometricsense.blogspot.com/2017/03/count-models-with-offsets-practical.html">Count Models with Offsets: Practical Applications using R</a><b><a href="http://econometricsense.blogspot.com/2017/03/count-models-with-offsets-practical.html"> </a></b><br /><br /><b>Principles of Count Model Regression</b></div><div class="MsoNormal"><br /></div><div class="MsoNormal">Often times we want to model the impact of some intervention or differences between groups in relation to an outcome that is a count. Examples of counts may be related to medical care utilization such as office visits or days in the hospital or total hospital admissions.</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><i>"In all cases the data are concentrated on a few small discrete values, say 0, 1 and 2; skewed to the left; and intrinsically heteroskedastic with variance increasing with the mean. These features motivate the application of special methods and models for count regression."</i> (Cameron and Trivedi, 1999).</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Poisson regression is <i>"the starting point for count data analysis, though it is often inadequate" </i>(Cameron and Trivedi,1999). The <i>"main focus is the effect of covariates on the frequency of an event, measured by non-negative integer values or counts"</i>(Cameron and Trivedi,1996).</div><div class="MsoNormal"><br /></div><div class="MsoNormal">From Cameron and Trivedi (1999) <i>"The Poisson regression model is derived from the Poisson distribution by parameterizing the relation between the mean parameter μ and covariates (regressors) x. The standard assumption is to use the exponential mean parameterization"</i></div><div class="MsoNormal"><br /></div><div class="MsoNormal">E(Y|x) = μ = exp(xβ)where xβ= β<sub>0</sub> +βx</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Equivalently:</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Log(μ) = xβ</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Where β is the change in the log of average counts given a change in X. Alternatively we can also say that E(Y|x) changes by a factor of exp(β) . Often exp(β) is interpreted as an ‘incident rate ratio’ although in many cases ‘rate’ and ‘mean’ are used interchangeably (Williams, 2016). </div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>Are we modeling mean or average counts or rates or does this matter?</b></div><div class="MsoNormal"><br /></div><div class="MsoNormal">We might often think of our data as counts, but count models assume that counts are always observed within time or space. This gives rise to an implicit rate interpretation of counts. For example, the probability mass function for a Poisson process can be specified as:</div><div class="MsoNormal"><br /></div><div class="MsoNormal">P(Y|μ) = exp(-μ)* μ<sup>-y</sup> / y!</div><div class="MsoNormal"><br /></div><div class="MsoNormal">The Poisson process described above gives us the probability of ‘y’ events occurring during a given interval. The parameter μ is the expected count or average number of occurrences in the specified or implied interval. Often but not always this interval is measured in units of time (i.e. ER visits per year or total leaks in 1 mile of pipeline). So even if we think of our outcome as counts we are actually modeling a rate per some implicit interval of observation whether we think of that explicitly or not. This is what gives us the incident rate ratio (IRR) interpretation of exponentiated coefficients. If we think of rates as:</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Rate = count/t</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Where t = time or interval of observation or exposure If all participants are assumed to have a common t this is essentially like dividing by 1 and our rate is for all practical purposes often interpreted as a count.</div><div class="MsoNormal"><br /></div>Equivalently, as noted in <a href="https://onlinecourses.science.psu.edu/stat504/node/168">STAT504 (UPenn) </a><i>when we model rates, mean count is proportional to t</i><b><i> </i>and </b><i>the interpretation of parameter estimates, α and β will stay the same as for the model of counts; you just need to multiply the expected counts by t.</i><br /><div class="MsoNormal"></div><div class="MsoNormal"><br /></div><div class="MsoNormal">If t =1 or everyone is observed for the same period of time, then we are back to just thinking about counts with an implied rate.</div><div class="MsoNormal"></div><div class="MsoNormal">As noted in Cameron and Trivedi (1996) count models and duration (survival) models can be viewed as duals. If observed units have different levels of exposure or duration or intervals of observation this is accounted for in count models through inclusion of an offset. Inclusion of an offset creates a model that explicitly considers interval ‘t<sub>x</sub>’ where t<sub>x</sub> represents exposure time for individuals with covariate value x:</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Log(μ/t<sub>x</sub>) = xβ here we are explicitly specifying a rate based on time ‘t<sub>x</sub>’</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Re-arranging terms we get:</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Log(μ) – Log(t<sub>x</sub>) = xβ </div><div class="MsoNormal">Log(μ) = xβ + Log(t<sub>x</sub>)</div><div class="MsoNormal">The term Log(t<sub>x</sub>) is referred to as an ‘offset.’</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><b>When should we include an offset?</b></div><div class="MsoNormal"><br /></div><div class="MsoNormal">Karen Grace Martin at the Analysis Factor gives a <a data-saferedirecturl="https://www.google.com/url?hl=en&q=http://www.theanalysisfactor.com/the-exposure-variable-in-poission-regression-models/&source=gmail&ust=1490279432592000&usg=AFQjCNHAFoZ3nWb8shmTPyFuUxg1EyQHoQ" href="http://www.theanalysisfactor.com/the-exposure-variable-in-poission-regression-models/" target="_blank">great explanation of modeling offsets or exposure in count models.</a> Here is an excerpt:</div><div class="MsoNormal"><br /></div><div class="MsoNormal"><i>"What this means theoretically is that by defining an offset variable, you are only adjusting for the amount of opportunity an event has.....A patient in for 20 days is twice as likely to have an incident as a patient in for 10 days....There is an assumption that the likelihood of events is not changing over time."</i></div><div class="MsoNormal"><br /></div><div class="MsoNormal">In another post Karen states: </div><div class="MsoNormal"><i>"It is often necessary to include an exposure or offset parameter in the model to account for the amount of risk each individual had to the event."</i></div><div class="MsoNormal"><br /></div><div class="MsoNormal">So if there are differences in exposure or observation times for different observations relevant to the outcome of interest then it makes sense to account for this by including offsets as specified above. By explicitly specifying t<sub>x</sub> we can account for differences in exposure time or observation periods unique to each observation. Often the relevant interval of exposure may be something other than time. Karen gives one example where it might not make sense to include an offset or account for time such as the number of words a toddler can say. Another example might be the number of correct words spelled in a spelling bee. In fact in this case time may be <a data-saferedirecturl="https://www.google.com/url?hl=en&q=https://books.google.com/books?id%3DPwbtCAAAQBAJ%26pg%3DPA70%26lpg%3DPA70%26dq%3Dcount%2Bmodels%2Bwith%2Bendogenous%2Bexposure%26source%3Dbl%26ots%3DLRv_0P0lEA%26sig%3DAY8qiXUMbwPfAiwGV8_w9u63DPM%26hl%3Den%26sa%3DX%26ved%3D0ahUKEwiYx4P9gunSAhWEbiYKHdx0D_8Q6AEILjAC%23v%3Donepage%26q%3Dcount%2520models%2520with%2520endogenous%2520exposure%26f%3Dfalse&source=gmail&ust=1490279432592000&usg=AFQjCNHoKTK0WrwN9FJtSTlC9Nod2o-jLw" href="https://books.google.com/books?id=PwbtCAAAQBAJ&pg=PA70&lpg=PA70&dq=count+models+with+endogenous+exposure&source=bl&ots=LRv_0P0lEA&sig=AY8qiXUMbwPfAiwGV8_w9u63DPM&hl=en&sa=X&ved=0ahUKEwiYx4P9gunSAhWEbiYKHdx0D_8Q6AEILjAC#v=onepage&q=count%20models%20with%20endogenous%20exposure&f=false" target="_blank">endogenous.</a> More correct words spelled by a participant imply a longer interval of observation, duration, or ‘exposure’. We would not make our decision about who is a better speller on the basis of time or a rate such as total correct words per minute. As all count models implicitly model rates, the implicit and most relevant interval here would be the contest itself. In practical terms this simply reverts to being a comparison of raw counts. </div><div class="MsoNormal"><br /></div><b>Summary:</b> Counts always occur within some interval of time or space and therefore can always have an implicit ‘rate’ interpretation. If counts are observed across different intervals in time or space for different observations then differences in outcomes should be modeled through the specification of an offset. Whether to include an offset really depends on answering the questions: (1) What is the relevant interval in time or space upon which our counts are based? (2) Is this interval different across our observations of counts?<br /><br /><b>References:</b><br /><div class="MsoNormal">Essentials of Count Data Regression. A. Colin Cameron and Pravin K. Trivedi. (1999)<br /><br />Count Data Models for Financial Data. A. Colin Cameron and Pravin K. Trivedi. (1996)</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Models for Count Outcomes. Richard Williams, University of Notre Dame, <a data-saferedirecturl="https://www.google.com/url?hl=en&q=http://www3.nd.edu/~rwilliam/&source=gmail&ust=1490279432592000&usg=AFQjCNHQdNQfyh7e0ZLr4JZ2DnNRfmQtVg" href="http://www3.nd.edu/%7Erwilliam/" target="_blank">http://www3.nd.edu/~rwilliam/</a> . Last revised February 16, 2016</div><div class="MsoNormal"><br /></div><div class="MsoNormal">Econometric Analysis of Count Data. By Rainer Winkelmann. 2nd Edition.</div><div class="MsoNormal"><br /></div><u>Notes:</u> This ignores any discussion related to overdispersion or inflated zeros which relate to other possible model specifications including negative binomial or zero-inflated poisson (ZIP) or zero-inflated negative binomial (ZINB) models.Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0