tag:blogger.com,1999:blog-24744983008595938072019-01-20T13:21:38.552-05:00Econometric SenseAn attempt to make sense of econometrics, biostatistics, machine learning, experimental design, bioinformatics, ....Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.comBlogger288125tag:blogger.com,1999:blog-2474498300859593807.post-40079503394213835162018-12-21T17:29:00.002-05:002019-01-20T13:21:38.523-05:00Thinking About Confidence Intervals: Horseshoes and Hand GrenadesIn a previous post, <a href="https://econometricsense.blogspot.com/2017/08/confidence-intervals-fad-or-fashion_7.html">Confidence Intervals: Fad or Fashion </a>I wrote about <a href="https://davegiles.blogspot.com/2011/08/overly-confident-future-nobel-laureate.html">Dave Giles' </a>post on interpreting confidence intervals. A primary focus of these discussions was how confidence intervals are often mis-interpreted. For instance the two statements below are common mischaracterizations of CIs:<br /><br />1) There's a 95% probability that the true value of the regression coefficient lies in the interval [a,b].<br />2) This interval includes the true value of the regression coefficient 95% of the time.<br /><br />You can read the previous post or Dave's post for more details. But in re-reading Dave's post myself recently one statement had me thinking:<br /><br /><i>"So, the first interpretation I gave for the confidence interval in the opening paragraph above is clearly wrong. The correct probability there is not 95% - it's either zero or 100%! The second interpretation is also wrong. "This interval" doesn't include the true value 95% of the time. Instead, 95% of such intervals will cover the true value."</i><br /><br />I like the way he put that...'<i>95% of such intervals' </i>distinguishing this from a particular observed/calculated confidence interval. I think someone trained to think about CIs in the incorrect probabilistic way may have trouble getting at this. So how might we think about this in a way that captures CIs in a way that is still useful, but doesn't get us tripped up with incorrect probability statements?<br /><br />My favorite statistics text is Degroot's Probability and Statistics. In the 4th edition they are very careful about explaining confidence intervals:<br /><br /><i>"Once we compute the observed values of a and b, the observed interval (a,b) is not so easy to interpret....Before observing the data we can be 95% confident that the random interval (A,B) will contain mu, but after observing the data, the safest interpretation is that (a,b) is simply the observed value of the random interval (A,B)"</i><br /><br />While Degroot is careful, it still may not be very intuitive. However, in Principles and Procedures of Statistics: A Biometrical Approach (Steel, Torie, and Dickey) they present a more intuitive explanation.<br /><br /><i>"since mu will either be or not be in the interval, that is P=0 or 1, the probability will actually be a measure of confidence we placed in the procedure that led to the statement. This is like throwing a ring at a fixed post; the ring doesn't land in the same position or even catch on the post every time. However we are able to say that we can circle the post 9 times out of 10, or whatever the value should be for the measure of our confidence in our proficiency."</i><br /><br />The ring tossing analogy seems to work pretty well. I'll customize it by using horseshoes instead. Yes 95 out of 100 times you might throw a ringer (in the game of <a href="https://en.wikipedia.org/wiki/Horseshoes">horseshoes</a> that is when the horse shoe circles the peg or stake when you toss it). You know this before you toss it. And to use Dave Giles language, <b>*before*</b> calculating a confidence interval we know that 95% of such intervals will cover the population parameter of interest. And, after we toss the shoe, it either circles the peg or not, that is a 1 or a 0 in terms of probability. Similarly, <b>*after* </b>computing a confidence interval, the true mean or population parameter of interest is covered or not with a probability of 0 or 100%.<br /><br />This isn't perfect, but thinking of confidence intervals this way at least keeps us honest about making probability statements.<br /><br />Going back to my previous post, I still like the description of confidence intervals Angrist and Pishke provide in Mastering 'Metrics, that is <i>'describing a set of parameter values consistent with our data.' </i><br /><br />For instance if we run the regression:<br /><br />y = b0 + b1X + e to estimate y = B0 + B1 + e<br /><br />and get our parameter estimate b with a 95% confidence interval like (1.2,1.8), we can say that our sample data is consistent with any population that has a B taking a value that falls in the interval. That implies there are a number of populations that our data would be consistent with. Narrower intervals imply very similar populations, very similar values of B, and speaks to more precision in our estimate of B.<br /><br />I really can't make an analogy for hand grenades. It just gave me a title with a ring to it.<br /><br />See also:<br /><a href="https://econometricsense.blogspot.com/2017/03/interpreting-confidence-intervals.html">Interpreting Confidence Intervals</a><br /><a href="https://econometricsense.blogspot.com/2015/03/two-reasons-to-consider-bayesian.html">Bayesian Statistics Confidence Intervals and Regularization</a><br /><a href="https://econometricsense.blogspot.com/2015/01/overconfident-confidence-intervals.html">Overconfident Confidence Interval</a>sMatt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-81416517546851111672018-10-20T11:11:00.000-04:002019-01-18T21:19:48.656-05:00Power and Sample Size Analysis in Applied EconometricsIn applied work in econometrics I've done a limited amount of power and sample size analysis. Recently I was thinking about a conversation from an episode of the <a href="http://www.econtalk.org/john-ioannidis-on-statistical-significance-economics-and-replication/">EconTalk podcast</a> with Russ Roberts and John Ioannidis where the topic of power came up:<br /><br /><i>“though I was trained as a Ph.D., got a Ph.D. in economics at the U. of Chicago, I never heard that phrase, 'power,' applied to a statistical analysis. What we did--and I think what most economists, many economists, still do, is: we had a data set; we had something we wanted to discover and test or examine or explore, depending on the nature of the problem.”</i><br /><br />That rings familiar to me. In eight years of attending talks and seminars in applied economics, what stands out are discussions of identification, endogeneity, standard errors etc. Not power or sample size. So I went back and looked at all of my copies of econometrics textbooks. These are well known and have been commonly used by masters and PhD graduate students in economics. <i>Econometric Analysis</i> by Greene, <i>Econometric Analysis of Cross Section and Panel Data</i> by Wooldridge, <i>A Course in Econometrics</i> by Goldberger, <i>A Guide to Econometrics</i> by Kennedy, <i>Using Econometrics </i>by Studenmund. I even threw in <i>Mastering 'Metrics </i>and <i>Mostly Harmless Econometrics</i> by Angrist and Pischke.<br /><br />While Wooldridge did discuss clustering and stratified sampling, most of the emphasis was placed on getting the correct standard errors and appropriate weighting. From my previous years of referencing these texts, as well as a cursory review again of the index and chapters of each one I could not find any treatment of power or sample size calculations.<br /><br />So I thought, maybe this is something covered in prerequisite courses. Going back to the undergraduate level in economics I recall very little about this. Checking a popular text, <i>Statistics for Business and Economics</i> by Anderson, Sweeney, Williams, Camm, and Cochran I did find a basic example in relation to power and sample sizes for a t-test. What about a graduate level pre-requisite for econometrics? In my first year of graduate school I took a graduate level course in mathematical statistics (this was a course doing business under a research methods title) that used Degroot's text <i><a href="https://www.pearson.com/us/higher-education/program/De-Groot-Probability-and-Statistics-4th-Edition/PGM146802.html">Probability and Statistics.</a></i> Definitely a lot about the concept of power in theory, but no emphasis on various calculations for sample size. The one textbook I own with treatment of this is <i>Principles and Procedures of Statistics, A Biometrical Approach </i>by Steel, Torrie, and Dickey. But that does not count because that was the text used in my <i>experimental design </i>course in graduate school. Not part of a standard econometrics curriculum.<br /><br />I've come to the conclusion that power and sample size analysis may not be widely emphasized in graduate econometrics training across the board in all programs. It's not something missed in a lecture a decade ago. Similar to advanced specialized topics like spatial econometrics, details related to power and sample size analysis, survey design, stratified random sampling etc. are likely covered depending on one's specialty in the field and the program.<br /><br />However, it is evident that some economists do this kind of work.<br /><br />For instance, here is an <a href="https://ageconsearch.umn.edu/bitstream/229195/2/Thompson%20et%20al._SAEA2016.pdf">example </a>from a paper with food economist Jayson Lusk:<br /><br /><i>"However, there are many economic problems where sample size directly affects a benefit or loss function. In these cases, sample size is an endogenous variable that should be considered jointly with other choice variables in an optimization problem. In this article we introduce an economic approach to sample size determination utilizing a Bayesian decision theoretic framework."</i><br /><br />As well as healthcare economist <a href="https://theincidentaleconomist.com/wordpress/power-calculations-for-the-oregon-medicaid-study/">Austin Frakt. </a><br /><br />So why do we care about power and sample size and what is 'power'?<br /><span style="font-family: "calibri"; font-size: 12pt;"><br /></span>Jim Manzi, Author of <i>Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society </i>offers the following analogy in an <a href="http://www.econtalk.org/jim-manzi-on-the-oregon-medicaid-study-experimental-evidence-and-causality/">Econ Talk podcast</a>:<br /><br /><i>“Well, the power in a statistical experiment, and I often use this analogy, is sort of like the magnification power on the microscope you probably used in high school biology. It has on the side, 4x, 8x, 16x, which is how many times it can increase the apparent size of a physical object. And the metaphor I'd use is, if I try and use a child's microscope to carefully observe a section of a leaf looking for an insect that's a little smaller than an ant, and I don't observe the ant, I can reliably say: I don't see the insect, and therefore there is no bug there. If I use that exact same microscope to try and find on that exact same piece of leaf, not a bug but a tiny microbe that's, you know, smaller than a speck of dust, I'll look at it and I'll say: it's all kind of fuzzy, I see a lot of squiggly things; I think that little squiggle might be something or it might not. I don't see the microbe, but I can't reliably say that therefore there is no microbe there, because trying to zoom in closer and closer to look for something that small, all I see is a bunch of fuzz. So my failure to see the microbe is a statement about the precision of my instrument, not about whether there's really a microbe on the leaf.”</i><br /><br />So, if we have a sample that is ‘not sufficiently powered’ it is possible that we could fail to find a relationship between treatment and outcome, even if one actually exists. Equivalently, our estimated regression coefficient may not be statistically significant when a relationship actually does exist. Increasing sample size is one primary way to increase power in an experiment. So the question becomes how large does ‘n’ have to be to have a sample sufficiently powered to detect the effect of a treatment on an outcome (at some stated level of significance)?<br /><br />So how do you do these calculations? If you can't find examples in your econometrics textbook (if you do find one let me know!) there are plenty of texts in the biostatistics genre that probably cover this. <i>Principles and Procedures of Statistics, A Biometrical Approach </i>by Steel, Torrie, and Dickey is one example that I started with. Cochran, W (1977). <i>Sampling. Techniques,</i> 3rd ed. is another often cited source.<br /><br />See also: <a href="https://econometricsense.blogspot.com/2017/04/andrew-gelman-on-econtalk.html">Andrew Gelman on Econtalk discussing "what does not kill my statistical significance makes it stronger"</a>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-46551154367421244622018-07-29T21:28:00.002-04:002018-07-29T22:03:11.322-04:00Performance of Machine Learning Models on Time Series DataIn the past few years there has been an increased interest among economists in machine learning. For more discussion see <a href="https://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html">here</a>, <a href="http://econometricsense.blogspot.com/2012/10/economists-as-data-scientists.html">here,</a> <a href="http://econometricsense.blogspot.com/2018/05/statistical-inference-vs-causal.html">here,</a> <a href="http://econometricsense.blogspot.com/2018/02/deep-learning-vs-logistic-regression.html">here</a>, <a href="http://econometricsense.blogspot.com/2017/02/machine-learning-in-finance-and.html">here</a>, <a href="http://econometricsense.blogspot.com/2016/10/why-data-science-needs-economics.html">here</a>, <a href="http://econometricsense.blogspot.com/2016/08/the-state-of-applied-econometrics.html">here,</a> and <a href="http://econometricsense.blogspot.com/2016/03/machine-learning-and-economics.html">here</a>. See also Mindy Mallory's recent post <a href="http://blog.mindymallory.com/2018/02/how-does-machine-learning-fit-into-agricultural-economics/">here.</a><br /><br />While some folks like <a href="https://youtu.be/Yx6qXM_rfKQ">Susan Athey</a> are beginning to develop the theory to understand how machine learning can contribute to causal inference, it has carved out a niche in the area of prediction. But what about times series analysis and forecasting?<br /><br />That is a question taken up by authors this past March in an interesting paper (Statistical and Machine Learning forecasting methods: Concerns and ways forward). They took a good look at the performance of popular machine learning algorithms relative to traditional statistical time series approaches. The authors found that traditional approaches including exponential smoothing and econometric time series approaches out performed algorithmic approaches from machine learning across a number of model specifications, algorithms, and time series data sources.<br /><br />Below are some interesting excerpts and takeaways from the paper:<br /><br />When I think of <a href="http://econometricsense.blogspot.com/search/label/time%20series">time series methods</a>, I think of things like cointegration, stationarity, autocorrelation, seasonality, auto-regressive conditional heteroskedasticity etc. (I recommend Mindy Mallory's posts on time series <a href="http://blog.mindymallory.com/2018/01/basic-time-series-analysis-the-game/">here)</a><br /><br />Hearing so much about the ability of some machine learning approaches (like deep learning) to <a href="https://www.quora.com/Does-deep-learning-reduce-the-importance-of-feature-engineering">mimick feature engineering</a>, I wondered how well algorithmic approaches would handle these issues in time series applications. The authors looked at some of the previous literature in relation to this:<br /><br /><i>"In contrast to sophisticated time series forecasting methods, where achieving stationarity in both the mean and variance is considered essential, the literature of ML is divided with some studies claiming that ML methods are capable of effectively modelling any type of data pattern and can therefore be applied to the original data [62]. Other studies however, have concluded the opposite, claiming that without appropriate preprocessing, ML methods may become unstable and yield suboptimal results [28]."</i><br /><br />One thing about this paper, as I read it, is that it does not take an adversarial or luddite tone toward machine learning methods in favor of more traditional approaches. While they found challenges related to predictive accuracy, they seemed to proactively look deeper to understand why ML algorithms performed the way they did and how to make ML approaches better at time series.<br /><br />One of the challenges with ML, even with crossvalidation was overfitting and confusion of signals, patterns, and noise in the data:<br /><br /><i>"An additional concern could be the extent of randomness in the series and the ability of ML models to distinguish the patterns from the noise of the data, avoiding over-fitting....A possible reason for the improved accuracy of the ARIMA models is that their parameterization is done through the minimization of the AIC criterion, which avoids over-fitting by considering both goodness of fit and model complexity."</i><br /><br />They also recommend instances where ML methods may offer advantages:<br /><br /><i>"even though M3 might be representative of the reality when it comes to business applications, the findings may be different if nonlinear components are present, or if the data is being dominated by other factors. In such cases, the highly flexible ML methods could offer significant advantage over statistical ones"</i><br /><br />It was interesting that basic exponential smoothing approaches outperformed much more complicated ML methods:<br /><br /><i>"the only thing exponential smoothing methods do is smoothen the most recent errors exponentially and then extrapolate the latest pattern in order to forecast. Given their ability to learn, ML methods should do better than simple benchmarks, like exponential smoothing."</i><br /><br />However the authors note it is often the case that smoothing methods can offer advantages over more complex econometric time series as well (i.e. ARIMA, VAR, GARCH etc.)<br /><br />Toward the end of the paper the authors go on to discuss in detail the differences in the domains where we have seen a lot of success in machine learning (speech and image recognition, games, self driving cars etc. ) vs. time series and forecasting applications.<br /><br />In <a href="http://journals.plos.org/plosone/article/figure?id=10.1371/journal.pone.0194889.t010">table 10 </a>of the paper, they drill into some of these specific differences and discuss structural instabilities related to time series data, how the 'rules' change and how forecasts themselves can influence future values, and how this kind of noise might be hard for ML algorithms to capture.<br /><br />This paper is definitely worth going through again and one to keep in mind if you are about to embark on an applied forecasting project.<br /><br /><b>Reference: </b><br /><br />Makridakis S, Spiliotis E, Assimakopoulos V (2018) Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLoS ONE 13(3): e0194889. https://doi.org/10.1371/journal.pone.0194889<br /><br />See also Paul Cuckoo's LinkedIn post on this paper: <a href="https://www.linkedin.com/pulse/traditional-statistical-methods-often-out-perform-machine-paul-cuckoo/">https://www.linkedin.com/pulse/traditional-statistical-methods-often-out-perform-machine-paul-cuckoo/ </a><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-61055935094782753122018-07-15T08:14:00.003-04:002018-07-15T08:16:40.836-04:00The Credibility Revolution(s) in Econometrics and EpidemiologyI've written before about the <a href="http://econometricsense.blogspot.com/2017/07/the-credibility-revolution-in.html">credibility revolution</a> in economics. It also seems that in parallel with econometrics, epidemiology has its own revolution to speak of. In <a href="https://blog.oup.com/2014/10/deconstruction-paradoxes-sociology-epidemiology/">The Deconstruction of Paradoxes in Epidemiology</a>, Miquel Porta writes:<br /><br /><i>"If a “revolution” in our field or area of knowledge was ongoing, would we feel it and recognize it? And if so, how?...The “revolution” is partly founded on complex mathematics, and concepts as “counterfactuals,” as well as on attractive “causal diagrams” like Directed Acyclic Graphs (DAGs). Causal diagrams are a simple way to encode our subject-matter knowledge, and our assumptions, about the qualitative causal structure of a problem. Causal diagrams also encode information about potential associations between the variables in the causal network. DAGs must be drawn following rules much more strict than the informal, heuristic graphs that we all use intuitively. Amazingly, but not surprisingly, the new approaches provide insights that are beyond most methods in current use......The possible existence of a “revolution” might also be assessed in recent and new terms as collider, M-bias, causal diagram, backdoor (biasing path), instrumental variable, negative controls, inverse probability weighting, identifiability, transportability, positivity, ignorability, collapsibility, exchangeable, g-estimation, marginal structural models, risk set, immortal time bias, Mendelian randomization, nonmonotonic, counterfactual outcome, potential outcome, sample space, or false discovery rate."</i><br /><br />There is a lot said there. Most economists find themselves at home in relation to discussions involving most of this including anything related to potential outcomes and counterfactuals and the methods like those mentioned in the last paragraph. However, what might seem to make the revolution in epidemiology different from econometrics (at least for some applied economists) is the emphasis on <a href="http://econometricsense.blogspot.com/2015/11/directed-acyclical-graphs-dags-and.html">directed acyclic graphs</a> (DAGs).<br /><br />Over at the Causal Analysis in Theory and Practice blog in a post titled <a href="http://causality.cs.ucla.edu/blog/index.php/2014/10/27/are-economists-smarter-than-epidemiologists-comments-on-imbenss-recent-paper/">"are economists smarter than epidemiologists (comments on imbens' recent paper)"</a> they discuss comments by Guido Imbens from a <a href="http://causality.cs.ucla.edu/blog/wp-content/uploads/2014/10/Imbens-rejoinder-2014.pdf">statistical science paper</a> (worth a read)<br /><br /><i>"In observational studies in social science, both these assumptions tend to be controversial. In this relatively simple setting, I do not see the causal graphs as adding much to either the understanding of the problem, or to the analyses."</i><br /><br />The blog post is quite critical of this stance:<br /><br /><i>"Can economists do in their heads what epidemiologists observe in their graphs? Can they, for instance, identify the testable implications of their own assumptions? Can they decide whether the IV assumptions (i.e., exogeneity and exclusion) are satisfied in their own models of reality? Of course the can’t; such decisions are intractable to the graph-less mind....Or, are problems in economics different from those in epidemiology? I have examined the structure of typical problems in the two fields, the number of variables involved, the types of data available, and the nature of the research questions. The problems are strikingly similar."</i><br /><br />Being trained in both biostatistics and econometrics, I encountered the credibility revolution and causal analysis mostly through seminars and talks on applied econometrics. <a href="http://jaysonlusk.com/blog/2016/5/12/does-diet-coke-cause-fat-babies">As economist Jayson Lusk puts it</a>:<br /><br /><i>"if you attend a research seminar in virtually any economics department these days, you're almost certain to hear questions like, "what is your identification strategy?" or "how did you deal with endogeneity or selection?" In short, the question is: how do we know the effects you're reporting are causal effects and not just correlations."</i><br /><br />The first applications I encountered utilizing DAGs were either from economist <a href="http://marcfbellemare.com/wordpress/12051">Marc Bellemare </a>with regard to one of his papers related to lagged explanatory variables, or it was a from a Statistics in Medicine <a href="https://www.ncbi.nlm.nih.gov/pubmed/17886233">paper</a> authored by Davey Smith et al featuring Mendelian randomization.<br /><br /><b>See also:</b><br /><br /><a href="http://econometricsense.blogspot.com/2014/04/how-is-it-that-structural-equation.html">How is it that SEMs subsume potential outcomes? </a><br /><a href="http://econometricsense.blogspot.com/2017/01/mediators-and-moderators_10.html">Mediators and moderators</a>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-16828434048970141642018-05-24T17:39:00.000-04:002018-06-05T19:48:55.921-04:00Statistical Inference vs. Causal Inference vs. Machine Learning: A motivating exampleIn his well known paper, Leo Breiman discusses the<a href="http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html"> 'cultural' differences </a>between algorithmic (machine learning) approaches and traditional methods related to inferential statistics. <a href="http://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html">Recently,</a> I discussed how important understanding these kinds of distinctions are when it comes to understanding how current automated machine learning tools can be leveraged in the data science space.<br /><br />In his paper Leo Breiman states:<br /><br /><i>"Approaching problems by looking for a data model imposes an apriori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems."</i><br /><br />On the other hand, <a href="https://www.youtube.com/watch?v=Yx6qXM_rfKQ&feature=share">Susan Athey's work</a> highlights the fact that no one has developed the asymptotic theory necessary to adequately address causal questions using methods from machine learning (i.e. how does a given machine learning algorithm fit into the context of the <a href="http://econometricsense.blogspot.com/2013/05/selection-bias-and-rubin-causal-model.html">Rubin Causal Model/potential outcomes framework</a>?)<br /><br />Dr. Athey is working to bridge some of this gap, but it's very complicated. I think there is a lot that can also be done, just understanding and communicating about the differences between inferential and causal questions vs. machine learning/predictive modeling questions. When should each be used for a given business problem? What methods does this entail?<br /><br />In an <a href="https://soundcloud.com/datamadetomatter/whole-person-healthcare-is-here">MIT Data Made to Matter podcast,</a> economist Joseph Doyle discusses his paper investigating the relationship between more aggressive (and expensive) treatments by hospitals and improved outcomes for medicare patients. Using this as an example, I hope to broadly illustrate some of these differences looking at this problem through all three lenses.<br /><br /><b>Statistical Inference</b><br /><br />Suppose we just want to know if there is a significant relationship between aggressive treatments 'A' and health outcomes (mortality) 'M.' We might estimate a regression equation (similar to one of the models in the paper) such as:<br /><br />M = b0 + b1*A + b2*X + e where X is a vector of relevant controls.<br /><br />We would be very careful about the nature of our data, correct functional form, and getting our standard errors correct to make valid inferences about our estimate 'b1' of the relationship between aggressive treatments A and mortality M. A lot of this is traditionally taught in econometrics, biostatistics, and epidemiology (things like heteroskedasticity, multicollinearity, distributional assumptions related to the error terms etc.)<br /><br /><b>Causal Inference</b><br /><br />Suppose we wanted to know if the estimate b1 in the equation above is causal. In Doyle's paper they discuss some of the challenges:<br /><br /><i>"A major issue that arises when comparing hospitals is that they may treat different types of patients. For example, greater treatment levels may be chosen for populations in worse health. At the individual level, higher spending is strongly associated with higher mortality rates, even after risk adjustment, which is consistent with more care provided to patients in (unobservably) worse health. At the hospital level, long-term investments in capital and labor may reflect the underlying health of the population as well. Differences in unobservable characteristics may therefore bias results toward finding no effect of greater spending."</i><br /><br />One of the points he is making is that even if we control for everything we typically measure in these studies (captured by X above) there are unobservable characteristics related to patients that weaken our estimate of b1. Recall that methods like regression and matching (<a href="http://econometricsense.blogspot.com/2017/07/regression-as-variance-based-weighted.html">which are two flavors of identification strategies based on selection on observables</a>) achieve identification by assuming that conditional on observed characteristics (X), selection bias disappears. We want to make conditional on X comparisons of Y (or M in the model above) that mimic as much as possible the experimental benchmark of random assignment (see more on matching estimators <a href="http://econometricsense.blogspot.com/2011/07/matching-estimators.html">here.</a>)<br /><br />However, if there are important characteristics related to selection that we don't observe and can't include in X, then in order to make valid causal statements about our results, we need a method that identifies treatment effects within a selection on 'un'-observables framework. (examples include <a href="http://econometricsense.blogspot.com/2012/12/difference-in-difference-estimators.html">difference-in-differences</a>, <a href="http://econometricsense.blogspot.com/2014/04/intuition-for-fixed-effects.html">fixed effects</a>, and <a href="http://econometricsense.blogspot.com/2017/07/instrumental-variables-and-late.html">instrumental variables</a>).<br /><br />In Doyle's paper, they used ambulance service as an instrument for hospital choice to make causal statements about A.<br /><br /><b>Machine Learning/Predictive Modeling</b><br /><br />Suppose we just want to predict mortality by hospital to support some policy or operational objective where the primary need is accurate predictions. A number of algorithmic methods might be exploited including logistic regression, decision trees, random forests, neural networks etc. Based on the mixed findings in the literature, a machine learning algorithm may not exploit 'A' at all even though Doyle finds a significant causal effect based on his instrumental variables estimator. The point is, in many cases a black box algorithm that includes or excludes treatment intensity as a predictor doesn't really care about the significance of this relationship or its causal mechanism, as long as at the end of the day the algorithm predicts well out of sample and maintains reliability and usefulness in application over time.<br /><br /><b>Discussion</b><br /><br />If we wanted to know if the relationship between intensity of care 'A' was statistically significant or causal, we would not rely on machine learning methods. At least nothing available on the shelf today pending further work by researchers like Susan Athey. We would develop the appropriate causal or inferential model designed to answer the particular question at hand. In fact, as Susan Athey points out in a past <a href="https://www.quora.com/How-will-machine-learning-impact-economics">Quora commentary,</a> models used for causal inference could possibly give worse predictions:<br /><br /><i>"Techniques like instrumental variables seek to use only some of the information that is in the data – the “clean” or “exogenous” or “experiment-like” variation in price—sacrificing predictive accuracy in the current environment to learn about a more fundamental relationship that will help make decisions...This type of model has not received almost any attention in ML."</i><br /><br />The point is, for the data scientist caught in the middle of so much disruption related to tools like automated machine learning, as well as technologies producing and leveraging large amounts of data, it is important to focus on business understanding and map the appropriate method to address what is trying to be achieved. The ability to understand the differences in tools and methodologies related to statistical inference, causal inference, and machine learning and explaining those differences to stakeholders will be important to prevent 'straight jacket' thinking about solutions to complex problems.<br /><br /><b>References:</b><br /><br />Doyle, Joseph et al. “Measuring Returns to Hospital Care: Evidence from Ambulance Referral Patterns.” The journal of political economy 123.1 (2015): 170–214. PMC. Web. 11 July 2017.<br />https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351552/<br /><br />Matt Bogard. "A Guide to Quasi-Experimental Designs" (2013)<br />Available at: http://works.bepress.com/matt_bogard/24/<br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-19597586567585989092018-04-17T06:53:00.000-04:002018-04-17T10:16:26.672-04:00He who must not be named....or can we say 'causal'?Recall in the Harry Potter series, the wizard community refused to say the name of 'Voldemort' and it got to the point where they almost stopped teaching and practicing magic (at least officially as mandated by the Ministry of Magic). In the research community, by refusing to use the term 'causal' when and where appropriate, are we discouraging researchers from asking interesting questions and putting forth the effort required to implement the kind of rigorous causal inferential methods necessary to push forward the frontiers of science? Could we somehow be putting a damper on teaching and practicing economagic...I mean econometrics...you know the <a href="http://www.mostlyharmlesseconometrics.com/">mostly harmless</a> kind? Will the <a href="http://econometricsense.blogspot.com/2017/07/the-credibility-revolution-in.html">credibility revolution </a>be lost?<br /><br />In a recent May 2018 article in the American Journal of Public Health (by Miguel Hernan of the Departments of Epidemiology and Biostatistics, Harvard School of Public Health) there is an important discussion about the somewhat tiring mantra <i>'correlation is not causation'</i> and disservice to scientific advancement that it can lead to in absence of critical thinking about research objectives and designs. Some people might think this is ironic, since often the phrase is invoked as a means to point out fallacious conclusions that have been uncritically based on mere correlations found in the data. However, the pendulum can swing too far in the other direction causing as much harm.<br /><br /><b style="font-style: italic;">I highly recommend reading this article! </b>It is available ungated and will be one of those you hold onto for a while. See the reference section below.<br /><br />Key to the discussion are important distinctions between questions of association, prediction, and causality. Below are some spoilers:<br /><br /><b>While it is wrong to assume causality based on association or correlation alone, refusing to recognize a causal approach in the analysis because of growing cultural 'norms' is also not good either....and should stop:</b><br /><br /><i>"The resulting ambiguity impedes a frank discussion about methodology because the methods used to estimate causal effects are not the same as those used to estimate associations...We need to stop treating “causal” as a dirty word that respectable investigators do not say in public or put in print. It is true that observational studies cannot definitely prove causation, but this statement misses the point"</i><br /><br /><b>All the glitters isn't gold, as the author notes on randomized controlled trials :</b><br /><br /><i>"Interestingly, the same is true of randomized trials. All we can estimate from randomized trials data are associations; we just feel more confident giving a causal interpretation to the association between treatment assignment and outcome because of the expected lack of confounding that physical randomization entails. However, the association measures from randomized trials cannot be given a free pass. Although randomization eliminates systematic confounding, even a perfect randomized trial only provides probabilistic bounds on “random confounding”—as reflected in the confidence interval of the association measure—and many randomized trials are far from perfect."</i><br /><br /><b>There are important distinctions between analysis and methodological approaches when asking questions related to prediction and association vs causality. Saying a bit more, this is not just about model interpretation. We are familiar with discussions about challenges related to interpreting predictive models derived from complicated black box algorithms, but causality hinges on much more than just the ability to interpret the impact of features on an outcome. Also note that while we are seeing applications of AI and automated feature engineering and algorithm selection, models optimized to predict well may not explain well at all. In fact, a causal model may perform worse in out of sample predictions of the 'target' while giving the most rigorous estimate of causal effects:</b><br /><br /><i>"In associational or predictive models, we do not try to endow the parameter estimates with a causal interpretation because we are not trying to adjust for confounding of the effect of every variable in the model. Confounding is a causal concept that does not apply to associations...By contrast, in a causal analysis, we need to think carefully about what variables can be confounders so that the parameter estimates for treatment or exposure can be causally interpreted. Automatic variable selection procedures may work for prediction, but not necessarily for causal inference. Selection algorithms that do not incorporate sufficient subject matter knowledge may select variables that introduce bias in the effect estimate, and ignoring the causal structure of the problem may lead to apparent paradoxes."</i><br /><br /><b>It all comes down to being a question of identification....or why AI has a long way to go in the causal space...or as Angrist and Pischke would put it....if applied econometrics were easy theorists would do it:</b><br /><br /><i>"Associational inference (prediction)or causal inference (counterfactual prediction)? The answer to this question has deep implications for (1) how we design the observational analysis to emulate a particular target trial and (2) how we choose confounding adjustment variables. Each causal question corresponds to a different target trial, may require adjustment for a different set of confounders, and is amenable to different types of sensitivity analyses. It then makes sense to publish separate articles for various causal questions based on the same data."</i><br /><br />I really liked how they phrased 'prediction' in terms of distinctly being associational or prospective vs. counterfactual. Also, what a nice way to think about 'identification' being about how we emulate a particular trial and handle confounding/selection bias/endogneity.<br /><br /><b>Reference:</b><br /><br />Miguel A. Hernán, “The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data”, American Journal of Public Health 108, no. 5 (May 1, 2018): pp. 616-619.<br /><br /><b>See also:</b><br /><br /><a href="http://econometricsense.blogspot.com/2018/03/will-there-be-credibility-revolution-in.html">Will there be a credibility revolution in data science and AI?</a><br /><br /><a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">To Explain or Predict?</a>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-27375590772657206992018-03-18T11:06:00.001-04:002018-03-19T20:31:25.606-04:00Will there be a credibility revolution in data science and AI? <i>Summary: Understanding where AI and automation are going to be the most disruptive to data scientists in the near term relates to understanding methodological differences between explaining and predicting, between machine learning and causal inference. It will require the ability to ask a different kind of question than machine learning algorithms are capable of answering off of the shelf today.</i><br /><div><br /></div>There is a lot of enthusiasim about the disruptive role of automation and AI in data science. Products like <a href="https://www.h2o.ai/">H20ai </a>and <a href="https://www.datarobot.com/">DataRobot</a> offer tools to automate or fast track many aspects of the data science work stream. If this trajectory continues, what will the work of the future data scientist look like?<br /><br />Many have already pointed out the very difficult task of automating the <a href="https://www.superdatascience.com/podcast-power-soft-skills-data-science/">soft skills </a>possessed by data scientists. In a previous <a href="https://www.linkedin.com/pulse/what-traders-know-future-data-science-matt-bogard/">LinkedIn post</a> I discussed this in the trading space where automation and AI could create substantial disruptions for both data scientists and traders. Here I quoted Matthew Hoyle:<br /><br /><i>"Strategies have a short shelf life-what is valuable is the ability and energy to look at new and interesting things and put it all together with a sense of business development and desire to explore"</i><br /><br />My conclusion: <i>They are talking about bringing a portfolio of useful and practical skills together to do a better job than was possible before open source platforms and computing power became so proliferate. I think that is the future.</i><br /><br />So the future is about rebalancing the data scientists portfolio of skills. However, in the near term I think the disruption from AI and automation in data science will do more than increase the emphasis on soft skills. In fact there will remain a significant portion of 'hard skills' that will see an increase in demand because of the difficulty of automation.<br /><br />Understanding this will depend largely on making a distinction between <a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">explaining and predicting.</a> Much of what appears to be at the forefront of automation involves tasks supporting supervised and unsupervised machine learning algorithms as well as other prediction and forecasting tools like time series analysis.<br /><br />Once armed with predictions, businesses will start to ask questions about 'why'. This will transcend prediction or any of the visualizations of the patterns and relationships coming out of black box algorithms. They will want to know what decisions or factors are moving the needle on revenue or customer satisfaction and engagement or improved efficiencies. Essentially they will want to ask questions related to causality, which requires a completely different paradigm for data analysis than questions of prediction. And they will want scientifically formulated answers that are convincing vs. mere reports about rates of change or correlations. There is a significant difference between understanding what drivers correlate with or 'predict' the outcome of interest and what is actually driving the outcome. What they will be asking for is a <a href="http://econometricsense.blogspot.com/2017/07/the-credibility-revolution-in.html">credibility revolution </a>in data science.<br /><br />What do we mean by a credibility revolution?<br /><br />Economist <a href="http://jaysonlusk.com/blog/2016/5/12/does-diet-coke-cause-fat-babies">Jayson Lusk</a> puts it well:<br /><br /><i>"Fortunately economics (at least applied microeconomics) has undergone a bit of credibility revolution. If you attend a research seminar in virtually any economi(cs) department these days, you're almost certain to hear questions like, "what is your identification strategy?" or "how did you deal with endogeneity or selection?" In short, the question is: how do we know the effects you're reporting are causal effects and not just correlations."</i><br /><br />Healthcare Economist <a href="http://theincidentaleconomist.com/wordpress/what-took-con-econometrics/">Austin Frakt</a> has a similar take:<br /><br /><i>"A “research design” is a characterization of the logic that connects the data to the causal inferences the researcher asserts they support. It is essentially an argument as to why someone ought to believe the results. It addresses all reasonable concerns pertaining to such issues as selection bias, reverse causation, and omitted variables bias. In the case of a randomized controlled trial with no significant contamination of or attrition from treatment or control group there is little room for doubt about the causal effects of treatment so there’s hardly any argument necessary. But in the case of a natural experiment or an observational study causal inferences must be supported with substantial justification of how they are identified. Essentially one must explain how a random experiment effectively exists where no one explicitly created one."</i><br /><br />How are these questions and differences unlike your typical machine learning application? Susan Athey does a great job explaining in a Quora response about how causal inference is different from off the shelf machine learning methods (the kind being automated today):<br /><br /><i>"Sendhil Mullainathan (Harvard) and Jon Kleinberg with a number of coauthors have argued that there is a set of problems where off-the-shelf ML methods for prediction are the key part of important policy and decision problems. They use examples like deciding whether to do a hip replacement operation for an elderly patient; if you can predict based on their individual characteristics that they will die within a year, then you should not do the operation...Despite these fascinating examples, in general ML prediction models are built on a premise that is fundamentally at odds with a lot of social science work on causal inference. The foundation of supervised ML methods is that model selection (cross-validation) is carried out to optimize goodness of fit on a test sample. A model is good if and only if it predicts well. Yet, a cornerstone of introductory econometrics is that prediction is not causal inference.....Techniques like instrumental variables seek to use only some of the information that is in the data – the “clean” or “exogenous” or “experiment-like” variation in price—sacrificing predictive accuracy in the current environment to learn about a more fundamental relationship that will help make decisions...This type of model has not received almost any attention in ML."</i><br /><br />Developing an identification strategy, as Jayson Lusk discussed above, and all that goes along with that (finding natural experiments or valid instruments, or <a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">navigating the garden of forking paths</a> related to propensity score matching or a number of other <a href="http://econometricsense.blogspot.com/2013/09/causal-inference-and-quasi-experimental.html">quasi-experimental methods</a>) involves careful considerations and decisions to be made and defended in ways that would be very challenging to automate. Even when human's do this there is rarely a single best approach to these problems. They are far from routine. Just ask anyone that has been through peer review or given a talk at an economics seminar or conference.<br /><br />The kinds of skills required to work in this space would be similar to those of the econometrician or epidemiologist or any quantitative researcher that has been culturally immersed in the social norms and practices that have evolved out of the credibility revolution.. as data science thought leader <a href="https://www.superdatascience.com/podcast-one-purpose-data-science-truth-analytics/">Eugene Dubossarsky puts it</a>:<br /><br /><i>“the most elite skills…the things that I find in the most elite data scientists are the sorts of things econometricians these days have…bayesian statistics…inferring causality” </i><br /><br />Noone has a crystal ball. It is not to say that the current advances in automation are falling short on creating value. They should no doubt create value like any other form of capital complementing the labor and soft skills of the data scientist. And they could free up more resources to focus on more causal questions that previously may not have been answered. I discussed this complementarity previously in a <a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html">related post</a>:<br /><br /><i> "correlations or 'flags' from big data might not 'identify' causal effects, but they are useful for prediction and might point us in directions where we can more rigorously investigate causal relationships if interested" </i><br /><br />However, if automation in this space is possible, it will require a different approach than what we have seen so far. We might look to the pioneering work that Susan Athey is doing converging machine learning and causal inference. It will require thinking in terms of potential outcomes, endogeniety, and counterfactuals which requires the ability to ask a different kind of question than machine learning algorithms are capable of answering off of the shelf today.<br /><br /><b>Additional References:</b><br /><br />From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making <a href="https://sites.google.com/view/causalnips2017">https://sites.google.com/view/causalnips2017</a><br /><br />Susan Athey on Machine Learning, Big Data, and Causation <a href="http://www.econtalk.org/archives/2016/09/susan_athey_on.html">http://www.econtalk.org/archives/2016/09/susan_athey_on.html </a><br /><br />Machine Learning and Econometrics (Susan Athey, Guido Imbens) <a href="https://www.aeaweb.org/conference/cont-ed/2018-webcasts">https://www.aeaweb.org/conference/cont-ed/2018-webcasts </a><br /><br /><b>Related Posts:</b><br /><b><br /></b>Why Data Science Needs Economics<br /><a href="http://econometricsense.blogspot.com/2016/10/why-data-science-needs-economics.html">http://econometricsense.blogspot.com/2016/10/why-data-science-needs-economics.html</a><br /><br />To Explain or Predict<br /><a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html</a><br /><br />Culture War: Classical Statistics vs. Machine Learning: <a href="http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html">http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html </a><br /><br />HARK! - flawed studies in nutrition call for credibility revolution -or- HARKing in nutrition research <a href="http://econometricsense.blogspot.com/2017/12/hark-flawed-studies-in-nutrition-call.html">http://econometricsense.blogspot.com/2017/12/hark-flawed-studies-in-nutrition-call.html</a><br /><br />Econometrics, Math, and Machine Learning<br /><a href="http://econometricsense.blogspot.com/2015/09/econometrics-math-and-machine.html">http://econometricsense.blogspot.com/2015/09/econometrics-math-and-machine.html</a><br /><br />Big Data: Don't Throw the Baby Out with the Bathwater<br /><a href="http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html">http://econometricsense.blogspot.com/2014/05/big-data-dont-throw-baby-out-with.html</a><br /><br />Big Data: Causality and Local Expertise Are Key in Agronomic Applications<br /><a href="http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html">http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html</a><br /><br />The Use of Knowledge in a Big Data Society II: Thick Data<br /><a href="https://www.linkedin.com/pulse/use-knowledge-big-data-society-ii-thick-matt-bogard/">https://www.linkedin.com/pulse/use-knowledge-big-data-society-ii-thick-matt-bogard/ </a><br /><br />The Use of Knowledge in a Big Data Society<br /><a href="https://www.linkedin.com/pulse/use-knowledge-big-data-society-matt-bogard/">https://www.linkedin.com/pulse/use-knowledge-big-data-society-matt-bogard/ </a><br /><br />Big Data, Deep Learning, and SQL<br /><a href="https://www.linkedin.com/pulse/deep-learning-regressionand-sql-matt-bogard/">https://www.linkedin.com/pulse/deep-learning-regressionand-sql-matt-bogard/</a><br /><br />Economists as Data Scientists<br /><a href="http://econometricsense.blogspot.com/2012/10/economists-as-data-scientists.html">http://econometricsense.blogspot.com/2012/10/economists-as-data-scientists.html </a>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-15314881679110651492018-02-13T07:25:00.000-05:002018-02-13T10:30:17.152-05:00Intuition for Random EffectsPreviously I wrote a <a href="http://econometricsense.blogspot.com/2014/04/intuition-for-fixed-effects.html">post</a> based on course notes from J.Blumenstock that attempted to provide some intuition for how fixed effects estimators can account for <a href="http://econometricsense.blogspot.com/2013/06/unobserved-heterogeneity-and-endogeneity.html">unobserved heterogeneity</a> (individual specific effects).<br /><br />Recently someone asked if I could provide a similarly motivating and intuitive example regarding random effects. Although I was not able to come up with a new example, I can definitely discuss random effects in the same context of the previous example. But first a little (less intuitive) background.<br /><br /><b>Background</b><br /><br />To recap, the purpose of both fixed and random effects estimators is to model treatment effects in the face of unobserved individual specific effects.<br /><br /><span style="font-family: "calibri";">y<sub>it</sub> =</span><span style="font-family: "symbol";">b</span><span style="font-family: "calibri";"> x<sub>it</sub> + </span>α<span style="font-family: "symbol";"></span><sub><span style="font-family: "calibri";">i</span></sub><span style="font-family: "calibri";"> + u<sub>it </sub></span><span style="font-family: "calibri";">(1)</span><span style="font-family: "calibri";"> </span><br /><br />In the model above this is represented by α<span style="font-family: "symbol";"></span><sub><span style="font-family: "calibri";">i . </span></sub>In terms of estimation, the difference between fixed and random effects depends on how we choose to model this term. In the context of fixed effects it can be captured through a dummy variable estimation (this creates different intercepts or shifts capturing specific effects) or by transforming the data, subtracting group (fixed effects) means from individual observations within each group. In random effects models, individual specific effects are captured by a composite error term (α<span style="font-family: "symbol";"></span><sub><span style="font-family: "calibri";">i</span></sub><span style="font-family: "calibri";"> + u<sub>it</sub></span>) which assumes that individual intercepts are drawn from a random distribution of possible intercepts. The random component of the error term α<span style="font-family: "symbol";"></span><sub><span style="font-family: "calibri";">i</span></sub><span style="font-family: "calibri";"> captures the individual specific effects in a different way from fixed effects models. </span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";">As noted in another post, <a href="http://econometricsense.blogspot.com/2011/01/mixed-fixed-and-random-effects-models.html">Fixed, Mixed, and Random Effects</a>, t</span><span style="font-family: "calibri";">he random effects model is estimated using Generalized Least Squares (GLS) :</span><br /><div class="MsoNormal"><br /></div><div class="MsoNormal">β<span style="font-family: "calibri";"><sub>GLS</sub> = (X’</span>Ω<sup><span style="font-family: "calibri";">-1</span></sup><span style="font-family: "calibri";">X)<sup>-1</sup>(X’</span>Ω<sup><span style="font-family: "calibri";">-1</span></sup><span style="font-family: "calibri";">Y) where </span>Ω<span style="font-family: "calibri";"> = I </span>⊗<span style="font-family: "calibri";"> </span>Σ<span style="font-family: "calibri";"> </span>(2) </div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "calibri";">Where </span>Σ is the variance α<sub><span style="font-family: "calibri";">i</span></sub><span style="font-family: "calibri";">+ u<sub>it</sub> </span>. <span style="font-family: "calibri";">If </span>Σ<span style="font-family: "calibri";"> is unknown, it is estimated, producing a feasible generalized least squares estimate </span>β<sub><span style="font-family: "calibri";">FGLS</span></sub></div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "calibri";"><b>Intuition for Random Effects</b></span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";">In my post <a href="http://econometricsense.blogspot.com/2014/04/intuition-for-fixed-effects.html">Intuition for Fixed Effects</a> I noted: </span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";"><i>"Essentially using a dummy variable in a regression for each city (or group, or type to generalize beyond this example) holds constant or 'fixes' the effects across cities that we can't directly measure or observe. Controlling for these differences removes the 'cross-sectional' variation related to unobserved heterogeneity (like tastes, preferences, other unobserved individual specific effects). The remaining variation, or 'within' variation can then be used to 'identify' the causal relationships we are interested in."</i></span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";">Lets look at the toy data I used in that example. </span><br /><span style="font-family: "calibri";"><br /></span><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-JjBLoMc0ICc/U0qOUqZ99WI/AAAAAAAAAoQ/zMp9lDaqppYkhRlq7I0gFdJJ9Xrue574gCPcBGAYYCw/s1600/PANEL%2BDATA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="194" data-original-width="731" height="84" src="https://2.bp.blogspot.com/-JjBLoMc0ICc/U0qOUqZ99WI/AAAAAAAAAoQ/zMp9lDaqppYkhRlq7I0gFdJJ9Xrue574gCPcBGAYYCw/s320/PANEL%2BDATA.png" width="320" /></a></div><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";"><br /></span><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-gGRoWBIp4t0/WoLt1tatOQI/AAAAAAAAD_o/KiVUlbTErCAyYXwZyRGl4V5KxNN8Vc98ACLcBGAs/s1600/image001.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="245" data-original-width="353" height="222" src="https://4.bp.blogspot.com/-gGRoWBIp4t0/WoLt1tatOQI/AAAAAAAAD_o/KiVUlbTErCAyYXwZyRGl4V5KxNN8Vc98ACLcBGAs/s320/image001.png" width="320" /></a></div><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";"><br /></span>The crude ellipses in the plots above (motivated by the example given in Kennedy, 2008) indicate the data for each city and the the 'within' variation exploited by fixed effects models (<a href="http://econometricsense.blogspot.com/2014/04/intuition-for-fixed-effects.html">that allowed us to correctly identify the correct price/quantity relationships expected in the previous post</a>). The differences between the ellipses represents 'between variation.' As Kennedy discusses, random effects models differ from fixed effects models in that they are able to exploit both 'within' and 'between' variation, producing an estimate that is a weighted average of both kinds of variation (via Σ in equation 2 above). OLS, on the other hand exploits both kinds of variation as an unweighted average.<br /><br /><b>More Details </b><br /><br />As Kennedy discusses, both FE and RE can be viewed as running OLS on different transformations of the data.<br /><br />For fixed effects:<i> "this transformation consists of subtracting from each observation the average of the values within its ellipse"</i><br /><br />For random effects: <i>"the EGLS (or FGLS above) calculation is done by finding a transformation of the data that creates a spherical variance-covariance matrix and then performing OLS on the transformed data."</i><br /><br />As Kennedy notes, the increased information used by RE makes them more efficient estimators, but correlation between 'x' and the error term creates bias. i.e. RE assumes that α<sub><span style="font-family: "calibri";">i </span></sub>is uncorrelated with (orthogonal to) regressors. Angrist and Pischke (2009) discuss (footnote, p. 223) that they prefer FE because the gains in efficiency are likely to be modest while the finite sample properties of RE may be worse. As noted on p.243 an important assumption for identification in FE is that the most important sources of variation are time invariant (because information from time varying regressors gets differenced out). Angrist and Pischke also have a nice discussion on page 244-245 discussing the choice between FE and lagged dependent variable models.<br /><br /><b>References:</b><br /><br />A Guide to Econometrics. Peter Kennedy. 6th Edition. 2008<br />Mostly Harmless Econometrics. Angrist and Pischke. 2009<br /><br /><span style="font-family: "calibri";">See also: <a href="http://marcfbellemare.com/wordpress/12335">‘Metrics Monday: Fixed Effects, Random Effects, and (Lack of) External Validity (Marc Bellemare.</a></span><br /><span style="font-family: "calibri";"><br /></span><span style="font-family: "calibri";">Marc notes: </span><br /><span style="font-family: "calibri";"><br /></span><i><span style="font-family: "calibri";">"Nowadays, in the wake of the Credibility Revolution, what we teach students is: “You should use RE when your variable of interest is orthogonal to the error term; if there is any doubt and you think your variable of interest is not orthogonal to the error term, use FE.” </span><span style="font-family: "calibri";">And since the variable can be argued to be orthogonal pretty much only in cases where it is randomly assigned in the context of an experiment, experimental work is pretty much the only time the RE estimator should be used."</span></i><br /><span style="font-family: "calibri";"><br /></span></div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-55369393461702618002018-02-02T21:25:00.001-05:002018-02-03T18:04:21.978-05:00Deep Learning vs. Logistic Regression ROC vs Calibration Explaining vs. PredictingFrank Harrel writes <a href="http://www.fharrell.com/post/medml/">Is Medicine Mesmerized by Machine Learning? </a>Some time ago I wrote about predictive modeling and the differences between <a href="http://econometricsense.blogspot.com/2013/04/is-roc-curve-good-metric-for-model.html">what the ROC curve may tell us and how well a model 'calibarates.'</a><br /><br />There I quoted from the journal <i>Circulation</i>:<br /><br /><i>'When the goal of a predictive model is to categorize individuals into risk strata, the assessment of such models should be based on how well they achieve this aim...The use of a single, somewhat insensitive, measure of model fit such as the c statistic can erroneously eliminate important clinical risk predictors for consideration in scoring algorithms'</i><br /><br />Not too long ago Dr. Harrel shares the following tweet related to this:<br /><br /><i>I have seen hundreds of ROC curves in the past few years. I've yet to see one that provided any insight whatsoever. They reverse the roles of X and Y and invite dichotomization. Authors seem to think they're obligatory. Let's get rid of 'em.</i> <a href="https://twitter.com/f2harrell">@f2harrell </a>8:42 AM - 1 Jan 2018<br /><br />In his Statistical Thinking post above, Dr. Harrel writes:<br /><br /><i>"Like many applications of ML where few statistical principles are incorporated into the algorithm, the result is a failure to make accurate predictions on the absolute risk scale. The calibration curve is far from the line of identity as shown below...The gain in c-index from ML over simpler approaches has been more than offset by worse calibration accuracy than the other approaches achieved."</i><br /><br />i.e. depending on the goal, better ROC scores don't necessarily mean better models.<br /><br />But this post was about more than discrimination and calibration. It was discussing the logistic regression approach taken in <a href="http://www.amjmed.com/article/S0002-9343(09)00103-X/pdf">Exceptional Mortality Prediction by Risk Scores from Common Laboratory Tests</a> vs the deep learning approach used in <a href="https://arxiv.org/abs/1711.06402">Improving Palliative Care with Deep Learning.</a><br /><br /><i>"One additional point: the ML deep learning algorithm is a black box, not provided by Avati et al, and apparently not usable by others. And the algorithm is so complex (especially with its extreme usage of procedure codes) that one can’t be certain that it didn’t use proxies for private insurance coverage, raising a possible ethics flag. In general, any bias that exists in the health system may be represented in the EHR, and an EHR-wide ML algorithm has a chance of perpetuating that bias in future medical decisions. On a separate note, I would favor using comprehensive comorbidity indexes and severity of disease measures over doing a free-range exploration of ICD-9 codes."</i><br /><br />This kind of pushes back against the idea that deep neural nets can effectively bypass feature engineering, or at least raises cautions in specific contexts.<br /><br />Actually, he is not as critical of the authors of this paper as he is about what he considers undue accolades it has received.<br /><br />This ties back to my post on LinkedIn a couple weeks ago, <a href="https://www.linkedin.com/pulse/deep-learning-regressionand-sql-matt-bogard/">Deep Learning, Regression, and SQL. </a><br /><br /><b>See also:</b><br /><br /><a href="http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html">To Explain or Predict</a><br /><a href="http://econometricsense.blogspot.com/2014/05/big-data-think-global-act-local-when-it.html">Big Data: Causality and Local Expertise Are Key in Agronomic Applications</a><br /><br /><b>And: </b><br /><br /><a href="https://www.ibm.com/developerworks/community/blogs/jfp/entry/Feature_Engineering_For_Deep_Learning?lang=en">Feature Engineering for Deep Learning</a><br /><a href="http://smerity.com/articles/2016/architectures_are_the_new_feature_engineering.htm">In Deep Learning, Architecture Engineering is the New Feature Engineering</a><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-30128598736247501572017-12-31T15:48:00.004-05:002017-12-31T15:49:03.511-05:00HARK! - flawed studies in nutrition call for credibility revolution -or- HARKing in nutrition researchThere was a nice piece over at the Genetic Literacy Project I read just recently: <i>Why so many scientific studies are flawed and poorly understood</i>. (<a href="https://geneticliteracyproject.org/2017/12/13/viewpoint-many-scientific-studies-flawed-poorly-understood/">link</a>). They gave a fairly intuitive example of false positives in research using coin flips. I like this because I used the specific example of flipping a coin 5 times in a row to demonstrate basic probability concepts in some of the stats classes I used to teach. Their example might make a nice extension:<br /><br /><i>"In Table 1 we present ten 61-toss sequences. The sequences were computer generated using a fair 50:50 coin. We have marked where there are runs of five or more heads one after the other. In all but three of the sequences, there is a run of at least five heads. Thus, a sequence of five heads has a probability of 0.55=0.03125 (i.e., less than 0.05) of occurring. Note that there are 57 opportunities in a sequence of 61 tosses for five consecutive heads to occur. We can conclude that although a sequence of five consecutive heads is relatively rare taken alone, it is not rare to see at least one sequence of five heads in 61 tosses of a coin."</i><br /><br />In other words, a 5 head run in a sequence of 61 tosses (as evidence against a null hypothesis of p(head) = .5 i.e. a fair coin) is their analogy for a false positive in research. Particularly they relate this to nutrition research where it is popular to use large survey questionnaires that consist of a large number of questions:<br /><br /><i>"asking lots of questions and doing weak statistical testing is part of what is wrong with the self-reinforcing publish/grants business model. Just ask a lot of questions, get false-positives, and make a plausible story for the food causing a health effect with a p-value less than 0.05"</i><br /><br />It is their 'hypothesis' that this approach in conjunction with a questionable practice referred to as 'HARKing' (hypothesizing after the results are known) is one reason we see so many conflicting headlines about what we should and should not eat or benefits or harms of certain foods and diets. There is some damage done in terms of peoples' trust in science as a result. They conclude:<br /><br /><i>"Curiously, editors and peer-reviewers of research articles have not recognized and ended this statistical malpractice, so it will fall to government funding agencies to cut off support for studies with flawed design, and to universities to stop rewarding the publication of bad research. We are not optimistic."</i><br /><br />More on HARKing.....<br /><br />A good article related to HARKing is a paper written by Norbert L. Kerr. By HARKing he specifically discusses it as the practice of proposing one hypothesis (or set of hypotheses) but later changing the research question *after* the data is examined. Then presenting the results *as if* the new hypothesis were the original. He does distinguish this from a more intentional exercise in scientific induction, inferring some relation or principle post hoc from a pattern of data. This is more like exploratory data analysis.<br /><br />I discussed exploratory studies and issues related to multiple testing in a previous post: <a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">Econometrics, Multiple Testing, and Researcher Degrees of Freedom. </a><br /><br />To borrow a quote from this post- "<i>At the same time, we do not want demands of statistical purity to strait-jacket our science. The most valuable statistical analyses often arise only after an iterative process involving the data"</i> (see, e.g., Tukey, 1980, and Box, 1997).<br /><br />To say the least, careful consideration of tradeoffs should be made in the way research is conducted, and as the post discusses in more detail, the <i>garden of forking paths </i>involved.<br /><br />I am not sure to what extent the <a href="http://econometricsense.blogspot.com/2017/07/the-credibility-revolution-in.html">credibility revolution</a> has impacted nutrition studies, but the lessons apply here.<br /><br /><b>References:</b><br /><br />HARKing: Hypothesizing After the Results are Known<span style="white-space: pre;"> </span><br />Norbert L. Kerr<br />Personality and Social Psychology Review<br />Vol 2, Issue 3, pp. 196 - 217<br />First Published August 1, 1998Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-57843006034924578562017-08-24T18:56:00.000-04:002017-08-24T19:58:29.809-04:00Granger Causality<i>"Granger causality is a standard linear technique for determining whether one time series is useful in forecasting another." </i>(Irwin and Sanders, 2011).<br /><br />A series 'granger' causes another series if it consistently predicts it. If series X granger causes Y, while we can't be certain that this relationship is causal in any rigorous way, we might be fairly certain that Y doesn't cause X.<br /><br />Example:<br /><br />Yt = B0 + B1*Yt-1 +... Bp*Yt-p + A2*Xt-1+.....+Ap*Xt-p + Et<br /><br />if we reject the hypothesis that all the 'A' coefficients jointly = 0 then 'X' granger causes 'Y'<br /><br />Xt = B0 + B1*Xt-1 +... Bp*Xt-p + A2*Yt-1+.....+Ap*Yt-p + Et<br /><br />if we reject the hypothesis that all the 'A' coefficients jointly = 0 then 'Y' granger causes 'X'<br /><br /><b>Applications:</b><br /><br />Below are some applications where granger causality methods were used to test the impacts of index funds on commodity market price and volatility.<br /><br />The Impact of Index Funds in Commodity Futures Markets:A Systems Approach<br />DWIGHT R. SANDERS AND SCOTT H. IRWIN<br />The Journal of Alternative Investments<br />Summer 2011, Vol. 14, No. 1: pp. 40-49<br /><br />Irwin, S. H. and D. R. Sanders (2010), “The Impact of Index and Swap Funds on Commodity Futures Markets: Preliminary Results”, OECD Food, Agriculture and Fisheries Working Papers, No. 27, OECD Publishing. doi: 10.1787/5kmd40wl1t5f-en<br /><br />Index Trading and Agricultural Commodity Prices:<br />A Panel Granger Causality Analysis<br />Gunther Capelle-Blancard and Dramane Coulibaly<br />CEPII, WP No 2011 – 28<br />No 2011 – 28<br />December<br /><br /><b>References:</b><br /><br />Using Econometrics: A Practical Guide (6th Edition) A.H. Studenmund. 2011<br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-66521274250437838972017-08-07T06:27:00.000-04:002018-12-21T13:31:04.945-05:00Confidence Intervals: Fad or Fashion <div style="margin-bottom: 0in;">Confidence intervals seem to be the fad among some in pop stats/data science/analytics. Whenever there is mention of p-hacking, or the ills of publication standards, or the pitfalls of null hypothesis significance testing, CIs almost always seem to be the popular solution.<br /><br /></div><div style="margin-bottom: 0in;">There are some attractive features of CIs. <a href="https://www.dgps.de/fachgruppen/methoden/mpr-online/issue7/art2/brandstaetter.pdf">This paper</a> provides some alternative views of CIs, discusses some strengths and weaknesses, and ultimately proposes that they are on balance superior to p-values and hypothesis testing. CIs can bring more information to the table in terms of effect sizes for a given sample however some of the statements made in this article need to be read with caution. I just wonder how much the fascination with CIs is largely the result of confusing a <a href="http://econometricsense.blogspot.com/2015/01/overconfident-confidence-intervals.html">Bayesian interpretation with a frequentist application</a> or just sloppy misinterpretation. I completely disagree that they are more straight forward to students (compared to interpreting hypothesis tests and p-values as the article claims).</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><a href="http://davegiles.blogspot.com/2011/08/overly-confident-future-nobel-laureate.html">Dave Giles</a> gives a very good review starting with the very basics of what is a parameter vs. an estimator vs. an estimate, sampling distributions etc. After reviewing the concepts key to understanding CIs he points out two very common interpretations of CIs that are clearly wrong:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>1) There's a 95% probability that the true value of the regression coefficient lies in the interval [a,b].</i></div><div style="margin-bottom: 0in;"><i>2) This interval includes the true value of the regression coefficient 95% of the time.</i></div><div style="margin-bottom: 0in;"><i><br /></i></div><div style="margin-bottom: 0in;"><i>"we really should talk about the (random) intervals "covering" the (fixed) value of the parameter. If, as some people do, we talk about the parameter "falling in the interval", it sounds as if it's the parameter that's random and the interval that's fixed. Not so!"</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">In <i>Robust misinterpretation of confidence intervals,</i> the authors take on the idea that confidence intervals offer a panacea for interpretation issues related to null hypothesis significance testing (NHST):<br /><br /><i>"Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual...Our findings suggest that many researchers do not know the correct interpretation of a CI....As is the case with p-values, CIs do not allow one to make probability statements about parameters or hypotheses."</i><br /><i><br /></i>The authors present evidence about this misunderstanding by presenting subjects with a number of false statements regarding confidence intervals (including the two above pointed out by Dave Giles) and noting the frequency of incorrect affirmations about their truth.<br /><br />In <i>Mastering 'Metrics,</i> Angrist and Pishcke give a great interpretation of confidence intervals that doesn't lend itself in my opinion as easily to abusive probability interpretations:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>"By describing a set of parameter values consistent with our data, confidence intervals provide a compact summary of the information these data contain about the population from which they were sampled"</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">Both hypothesis testing and confidence intervals are statements about the compatibility of our observable sample data with population characteristics of interest. <a href="http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108">The ASAreleased a set of clarifications on statements on p-values. </a>Number 2 states that <i>"P-values do not measure the probability that the studied hypothesis is true."</i> Nor does a confidence interval (again see Ranstan, 2014).<br /><br />Venturing into the risky practice of making imperfect analogies, take this loosely from the perspective of criminal investigations. We might think of confidence intervals as narrowing the range of suspects based on observed evidence, without providing specific probabilities related to the guilt or innocence of any particular suspect. Better evidence narrows the list, just as better evidence in our sample data (less noise) will narrow the confidence interval.</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">I see no harm in CIs and more good if they draw more attention to practical/clinical significance of effect sizes. But I think the temptation to incorrectly represent CIs can be just as strong as the temptation to speak boldly of 'significant' findings following an exercise in p-hacking or in the face of meaningless effect sizes. Maybe some sins are greater than others and proponents feel more comfortable with misinterpretations/overinterpretations of CIs than they do with misinterpretations/overinterpretaions of p-values.</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">Or as <a href="http://wmbriggs.com/post/11862/">Briggs concludes </a>about this issue:</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><i>"Since no frequentist can interpret a confidence interval in any but in a logical probability or Bayesian way, it would be best to admit it and abandon frequentism"</i></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;"><b>See also: </b><br /><a href="http://andrewgelman.com/2014/12/11/fallacy-placing-confidence-confidence-intervals/">Andrew Gelman: The Fallacy of Placing Confidence in Confidence Intervals.</a><br /><a href="http://noahpinionblog.blogspot.com/2015/08/the-backlash-to-backlash-against-p.html">Noah Smith: The Backlash to the Backlash Against P-values</a><br /><b><br /></b><b>References:</b></div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">Methods of Psychological Research Online 1999, Vol.4, No.2 © 1999 PABST SCIENCE PUBLISHERS Confidence Intervals as an Alternative to Significance Testing Eduard Brandstätter1 Johannes Kepler Universität Linz</div><div style="margin-bottom: 0in;"><br /></div><div style="margin-bottom: 0in;">J. Ranstam, Why the -value culture is bad and confidence intervals a better alternative, Osteoarthritis and Cartilage, Volume 20, Issue 8, 2012, Pages 805-808, ISSN 1063-4584, http://dx.doi.org/10.1016/j.joca.2012.04.001 (http://www.sciencedirect.com/science/article/pii/S1063458412007789)<br /><br />Robust misinterpretation of confidence intervals<br />Rink Hoekstra & Richard D. Morey & Jeffrey N. Rouder &<br />Eric-Jan Wagenmakers Psychon Bull Rev<br />DOI 10.3758/s13423-013-0572-3 2014</div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-47325601579728890672017-07-21T19:07:00.002-04:002017-07-21T19:18:04.684-04:00Regression as a variance based weighted average treatment effectIn <a href="http://www.mostlyharmlesseconometrics.com/">Mostly Harmless Econometrics</a> Angrist and Pischke discuss regression in the context of matching. Specifically they show that regression provides variance based weighted average of covariate specific differences in outcomes between treatment and control groups. Matching gives us a weighted average difference in treatment and control outcomes weighted by the empirical distribution of covariates. (see more here). I wanted to roughly sketch this logic out below.<br />
<br /><b>Matching</b><br />
<br /> δATE = E[y1i | Xi,Di=1] - E[y0i | Xi,Di=0] = ATE <br /><br />This gives us the average difference in mean outcomes for treatment and control (y1i,y0i ⊥ Di) i.e. in a randomized controlled experiment potential outcomes are independent from treatment status<br />
<br />We represent the matching estimator empirically by:<br /><br /> Σ δx P(Xi,=x) where δx is the difference in mean outcome values between treatment and control units at a particular value of X, or difference in outcome for a particular combination of covariates (y1,y0 ⊥ Di|xi) i.e. conditional independence assumed- hence identification is achieved through a selection on observables framework.<br />
<br />
Average differences δx are weighted by the distribution of covariates via the term P(Xi,=x).<br /><br /><b>Regression</b><br /><br />We can represent a regression parameter using the basic formula taught to most undergraduates:<br /><br />Single Variable: β = cov(y,D)/v(D)<br />Multivariable: βk = cov(y,D*)/v(D*) <br /><br />where D* = residual from regression of D on all other covariates and
E(X’X)-1E(X’y) is a vector with the kth element cov(y,x*)/v(x*) where x* is the residual from regression of that particular ‘x’ on all other covariates.<br /><br />We can then represent the estimated treatment effect from regression as:<br /><br /> δR = cov(y,D*)/v(D*) = E[(Di-E[Di|Xi])E[yiIDiXi] / E[(Di-E[Di|Xi])^2] assuming (y1,y0 ⊥ Di|xi)<br /><br />Again regression and matching rely on similar identification strategies based on selection on observables/conditional independence.<br /><br />Let E[yi | DiXi] = E[yi | Di =0,Xi] + δx Di<br /><br />Then with more algebra we get: δR = cov(y,D*)/v(D*) = E[σ^2D(Xi) δx]/ E[σ^2D(Xi)]<br /><br />where σ^2D(Xi) is the conditional variance of treatment D given X or E{E[(Di –E[Di|Xi])^2|Xi]}.<br /><br />While the algebra is cumbersome and notation heavy, we can see that the way most people are familiar with viewing a regression estimate cov(y,D*)/v(D*) is equivalent to the term (using expectations) E[σ2D(Xi) δx]/ E[σ2D(Xi)] , and we can see that this term contains the product of the conditional variance of D and our covariate specific differences in treatment and controls δx.<br /><br />Hence, regression gives us a variance based weighted average treatment effect, whereas matching provides a distribution weighted average treatment effect.<br /><br />So what does this mean in practical terms? Angrist and Piscke explain that regression puts more weight on covariate cells where the conditional variance of treatment status is the greatest, or where there are an equal number of treated and control units. They state that differences matter little when the variation of δx is minimal across covariate combinations.<br /><br />In his post <a href="http://hrisblattman.com/2010/10/27/the-cardinal-sin-of-matching/">The cardinal sin of matching</a>, Chris Blattman puts it this way:<br /><br /><i>"For causal inference, the most important difference between regression and matching is what observations count the most. A regression tries to minimize the squared errors, so observations on the margins get a lot of weight. Matching puts the emphasis on observations that have similar X’s, and so those observations on the margin might get no weight at all....Matching might make sense if there are observations in your data that have no business being compared to one another, and in that way produce a better estimate" </i><br /><br />Below is a very simple contrived example. Suppose our data looks like this:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-On_psQKxI-g/WXJZtY5szZI/AAAAAAAACgQ/CKBXoEkkJrIKWhZRdfqDtgBXYaoP048aQCLcBGAs/s1600/Regression%2Bvs%2BMatching.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="496" data-original-width="475" height="320" src="https://1.bp.blogspot.com/-On_psQKxI-g/WXJZtY5szZI/AAAAAAAACgQ/CKBXoEkkJrIKWhZRdfqDtgBXYaoP048aQCLcBGAs/s320/Regression%2Bvs%2BMatching.png" width="306" /></a></div>We can see that those in the treatment group tend to have higher outcome values so a straight comparison between treatment and controls will <a href="http://econometricsense.blogspot.com/2013/05/selection-bias-and-rubin-causal-model.html">overestimate treatment effects due to selection bias:</a><br /><br /> <span style="mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">E[Y<sub>i</sub>|d<sub>i</sub>=1] - E[Y<sub>i</sub>|d<sub>i</sub>=0] =E[Y<sub>1i</sub>-Y<sub>0i</sub>]<span style="mso-spacerun: yes;"> </span>+{E[Y<sub>0i</sub>|d<sub>i</sub>=1] - E[Y<sub>0i</sub>|d<sub>i</sub>=0]} </span><br /><br /> However, if we estimate differences based on an exact matching scheme, we get a much smaller estimate of .67. If we run a regression using all of the data we get .75. If we consider 3.78 to be biased upward then both matching and regression have significantly reduced it, and depending on the application the difference between .67 and .75 may not be of great consequence. Of course if we run the regression including only matched variables, we get exactly the same results. (see R code below). This is not so different than the method of <a href="http://econometricsense.blogspot.com/2015/03/using-r-matchit-package-for-propensity.html">trimming based on propensity scores </a>suggested in Angrist and Pischke.<br /><br /><br />Both methods rely on the same assumptions for identification, so noone can argue superiority of one method over the other with regard to identification of causal effects.<br /><br />Matching has the advantage of having a nonparametric, alleviating concerns with functional form. However, there are <a href="http://econometricsense.blogspot.com/2015/01/considerations-in-propensity-score.html">lots of considerations</a> to work through in matching (i.e. 1:1, 1:many, <a href="http://econometricsense.blogspot.com/2015/03/propensity-score-matching-optimal.html">optimal caliper width</a>, variance/bias tradeoff and kernel selection etc.). While all of these possibilities might lead to better estimates, I wonder if they don't sometimes lead to a <a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">garden of forking paths. </a><br /><br /><b>See also: </b><br /><br />For a neater set of notes related to this post, see:<br /><b> </b><br />Matt Bogard. "Regression and Matching (3).pdf" <em>Econometrics, Statistics, Financial Data Modeling</em> (2017). Available at: http://works.bepress.com/matt_bogard/37/ <b> </b><br /><br /><a href="http://econometricsense.blogspot.com/2015/03/using-r-matchit-package-for-propensity.html">Using R MatchIt for Propensity Score Matching</a><br /><br /><b>R Code:</b><br /><br /># generate demo data<br /><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">x <- c(4,5,6,7,8,9,10,11,12,1,2,3,<wbr></wbr>4,5,6,7,8,9)</div><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">d <- c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,<wbr></wbr>0,0,0,0)</div><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">y <- c(6,7,8,8,9,11,12,13,14,2,3,4,<wbr></wbr>5,6,7,8,9,10)</div><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><br /></div><div class="MsoNormal" style="-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; margin: 0px; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">summary(lm(y~x+d)) # regression controlling for x</div>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-28968690898171212512017-07-12T21:05:00.000-04:002017-07-13T09:48:15.194-04:00Instrumental Variables and LATEOften in program evaluation we are interested in estimating the average treatment effect (ATE). This is in theory the effect of treatment on a randomly selected person from the population. This can be estimated in the context of a randomized controlled trial (RCT) by a comparison of means between treated and untreated participants.<br /><br />However, sometimes in a randomized experiment, some members selected for treatment may not actually receive treatment (if participation is voluntary, <a href="http://econometricsense.blogspot.com/2014/01/the-oregon-medicaid-experiment-applied.html">for example the Medicaid expansion in Oregon</a>). In this case, sometimes researchers will compare differences in outcome between those selected for treatment vs those assigned to control groups. This analysis, as assigned or as randomized, is referred to as an intent-to-treat analysis (ITT). With perfect compliance, ITT = ATE.<br /><br /><a href="http://econometricsense.blogspot.com/2017/06/instrumental-variables-vs-intent-to.html">As discussed previously,</a> using treatment assignment as an instrumental variable (IV) is another approach to estimating treatment effects. This is referred to as a local average treatment effect (LATE).<br /><br /><b>What is LATE and how does it give us an unbiased estimate of causal effects?</b><br /><br />In simplest terms, LATE is the ATE for the sub-population of compliers in an RCT (or other natural experiment where an instrument is used).<br /><br />In a randomized controlled trial you can characterize participants as follows: (<a href="http://egap.org/methods-guides/10-things-you-need-know-about-local-average-treatment-effect">see this reference from egap.org</a> for a really great primer on this)<br /><br /><b>Never Takers: </b>those that refuse treatment regardless of treatment/control assignment.<br /><br /><b>Always Takers: </b>those that get the treatment even if they are assigned to the control group.<br /><br /><b>Defiers: </b>Those that get the treatment when assigned to the control group and do not receive treatment when assigned to the treatment group. (these people violate an IV assumption referred to monotonicity)<br /><br /><b>Compliers:</b> those that comply or receive treatment if assigned to a treatment group but do not recieve treatment when assigned to control group. <br /><br />The outcome for never takers is the same regardless of treatment assignment and in effect cancel out in an IV analysis. As discussed by <a href="http://press.princeton.edu/titles/10363.html">Angrist and Pishke in Mastering Metrics</a>, the always takers are prime suspects for creating bias in non-compliance scenarios. These folks are typically the more motivated participants and likely would have higher potential outcomes or potentially have a greater benefit from treatment than other participants. The compliers are characterized as participants that receive treatment only as a result of random assignment. The estimated treatment effect for these folks is often very desirable and in an IV framework can give us an unbiased causal estimate of the treatment effect. This is what is referred to as a local average treatment effect or LATE.<br /><br /><b>How do we estimate LATE with IVs?</b><br /><br />One way LATE estimates are often described is as dividing the ITT effect by the share of compliers. This can also be done in a regression context. Let D be an indicator equal to 1 if treatment is received vs. 0, and let Z be our indicator (0,1) for the original randomization i.e. our instrumental variable. We first regress:<br /><br />D = β<sub>0</sub> + β<sub>1</sub> Z + e<span style="mso-tab-count: 1;"> </span><br /><br /><span style="mso-tab-count: 1;">This captures all of the variation in our treatment that is related to our instrument Z, or random assignment. This is<i> 'quasi-experimental'</i> variation. It is also an estimate of the rate of compliance. </span>β<sub>1</sub> only picks up the variation in treatment D that is related to Z and leaves all of the variation and unobservable factors related to self selection (i.e. bias) in the residual term.<span style="mso-spacerun: yes;"> </span>You can think of this as the filtering process. We can represent this as: COV(D,Z)/V(Z). <br /><br />Then, to relate changes in Z to changes in our target Y we estimate β<sub>2</sub> or COV(Y,Z)/V(Z).<br /><br /><div class="MsoNormal"></div><div class="MsoNormal">Y = β<sub>0</sub> +β<sub>2</sub> Z + e <span style="mso-tab-count: 1;"> </span><br /></div><div class="MsoNormal"></div><div class="MsoNormal">Our instrumental variable estimator then becomes:<br /></div><div class="MsoNormal"></div><div class="MsoNormal">β<sub>IV</sub> = β<sub>2</sub> / β<sub>1</sub><span style="mso-spacerun: yes;"> </span>or (Z’Z)<sup>-1</sup>Z’Y / (Z’Z)<sup>-1</sup>Z’D or COV(Y,Z)/COV(D,Z) <span style="mso-tab-count: 1;"></span></div><br />The last term gives us the total proportion of <i style="mso-bidi-font-style: normal;">‘quasi-experimental variation’</i> in D related to Y.<span style="mso-spacerun: yes;"> We can also view this through a 2SLS modeling strategy:</span><br /><br /><br /><div class="MsoNormal"><span style="font-family: "Times New Roman";">Stage 1: Regress D on Z to get D* or </span>D = β<sub>0</sub> + β<sub>1</sub> Z + e<span style="mso-tab-count: 1;"> </span></div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: "Times New Roman";">Stage 2: Regress Y on D* or </span>Y = β<sub>0</sub> +β<sub>IV</sub> D* + e <span style="mso-tab-count: 1;"><br /></span></div><br /> As described in <a href="http://www.mostlyharmlesseconometrics.com/">Mostly Harmless Econometrics,</a> <span style="font-family: "Times New Roman";"><i>"Intuitively, conditional on covariates, 2SLS retains only the variation in s </i>[D in our example above] <i>that is generated by quasi-experimental variation- that is generated by the instrument z" </i></span><br /><br />Regardless of how you want to interpret β<sub>IV</sub>, we can see that it teases out only that variation in our treatment D that is unrelated to selection bias and relates it to Y giving us an estimate for the treatment effect of D that is less biased.<br /><br />The causal path can be represented as:<br /><br />Z →D→Y <span style="mso-tab-count: 1;"> </span><br /><span style="mso-tab-count: 1;"><br /></span><span style="mso-tab-count: 1;"><a href="http://econometricsense.blogspot.com/2015/11/instrumental-explanations-of.html">There are lots of other ways to think about how to interpret IVs.</a> Ultimately they provide us with an estiamate of the LATE which can be interpreted as an average causal effect of treatment for those participants in a study whose enrollment status is determined completely by Z (the treatment assignment) i.e. the compliers and this is often a very relevant effect of interest. </span><br /><br /><span style="mso-tab-count: 1;">Marc Bellemare has some really good posts related to this see <a href="http://marcfbellemare.com/wordpress/7174">here</a>, <a href="http://marcfbellemare.com/wordpress/7182">here, </a>and <a href="http://marcfbellemare.com/wordpress/7231">here.</a></span><br /><br /><span style="mso-tab-count: 1;"><br /></span>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-90227917604117082692017-07-11T18:04:00.000-04:002017-07-11T18:04:01.685-04:00The Credibility Revolution in EconometricsPreviously I wrote about how <a href="http://econometricsense.blogspot.com/2017/07/the-value-of-graduate-educationand.html">graduate training (and experience) can provide a foundation for understanding statistics, experimental design, and interpretation of research.</a> I think this is common across many master's and doctoral level programs. But some programs approach this a little differently than others. Because of the <a href="http://www.nber.org/papers/w15794">credibility revolution</a> in economics, there is a special concern for identification and robustness. And even within the discipline, there is concern that this has not been given enough emphasis in modern textbooks and curricula (see <a href="https://www.weforum.org/agenda/2015/05/why-econometrics-teaching-needs-an-overhaul/">here </a>and <a href="http://www.nber.org/papers/w23144?utm_campaign=ntw&utm_medium=email&utm_source=ntw">here</a>). However, this may not be well understood or appreciated by those outside the discipline. <br /><br /><b>What is the credibility revolution and what does it mean in terms of how we do research?</b><br /><br />I like to look at this through the lens of applied economists working in the field:<br /><br />Economist <a href="http://jaysonlusk.com/blog/2016/5/12/does-diet-coke-cause-fat-babies">Jayson Lusk</a> puts it well:<br /><br /><i>"Fortunately economics (at least applied microeconomics) has undergone a bit of credibility revolution. If you attend a research seminar in virtually any economist department these days, you're almost certain to hear questions like, "what is your identification strategy?" or "how did you deal with endogeneity or selection?" In short, the question is: how do we know the effects you're reporting are causal effects and not just correlations."</i><br /><br />Healthcare Economist <a href="http://theincidentaleconomist.com/wordpress/what-took-con-econometrics/">Austin Frakt has a similar take:</a><br /><br /><i>"A “research design” is a characterization of the logic that connects the data to the causal inferences the researcher asserts they support. It is essentially an argument as to why someone ought to believe the results. It addresses all reasonable concerns pertaining to such issues as selection bias, reverse causation, and omitted variables bias. In the case of a randomized controlled trial with no significant contamination of or attrition from treatment or control group there is little room for doubt about the causal effects of treatment so there’s hardly any argument necessary. But in the case of a natural experiment or an observational study causal inferences must be supported with substantial justification of how they are identified. Essentially one must explain how a random experiment effectively exists where no one explicitly created one."</i><br /><br /> How do we get substantial justification? Angrist and Pischke give a good example in their text <a href="http://www.mostlyharmlesseconometrics.com/">Mostly Harmless Econometrics </a>in their discussion of fixed effects and lagged dependent variables:<br /><br /><i>"One answer, as always is to check the robustness of your findings using alternative identifying assumptions. That means you would like to find broadly similar results using plausible alternative models." </i><br /><br />To someone trained in the physical or experimental sciences, this might 'appear' to look like data mining. But <a href="http://marcfbellemare.com/wordpress/11833">Marc Bellemare makes a strong case that it is not!</a><br /><br /><i>"Unlike experimental data, which often allow for a simple comparison of means between treatment and control groups, observational data require one to slice the data in many different ways to make sure that a given finding is not spurious, and that the researchers have not cherry-picked their findings and reported the one specification in which what they wanted to find turned out to be there. As such, all those tables of robustness checks are there to do the exact opposite of data mining."</i><br /><br />That's what the credibility revolution is all about. <br /><br /><b>See also: </b><br /><br /><a href="http://marcfbellemare.com/wordpress/10966">Do Both! </a>(by Marc Bellemare)<br /><a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html">Applied Econometrics</a><br /><a href="http://econometricsense.blogspot.com/2015/11/econometrics-multiple-testing-and.html">Econometrics, Multiple Testing, and Researcher Degrees of Freedom</a><br /><br /><br /><br /><br /><br /><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-22313281642850014762017-07-10T11:57:00.000-04:002017-07-11T13:05:38.483-04:00The Value of Graduate Education....and ExperienceSome time ago I wrote a piece titled <a href="http://econometricsense.blogspot.com/2010/09/why-study-appliedagricultural-economics.html">"Why Study Agricultural and Applied Economics."</a> While this was somewhat geared toward graduate study, degrees in these areas provide a great combination of quantitative and analytical skills at the undergraduate level suitable for a number of roles in industry, especially when combined with programming like R, SAS, or Python. (just think Nate Silver). Another example would be the number of financial analysts and risk management and modeling roles held by graduates holding bachelor's degrees in economics and finance or related fields. Not everyone needs to be a PhD holding rocket scientist to do complex analytical work in applied fields.<br /><br />However, what are some arguments for graduate study? I bring this up because sometimes I wonder, given my role in the private sector could I have had a similar trajectory if I just skipped the time, money and energy spent in graduate school and went straight to writing code?<br /><br />Perhaps. But recently I was listening to a <a href="http://www.talkingbiotechpodcast.com/088-food-evolution-the-movie/">Talking Biotech podcast with Kevin Folta discussing the movie Food Evolution.</a> Toward the end they discussed some critiques of the film, and a common critique about research in general is bias due to conflicts of interest. Kevin States:<br /><br /><i>"I've trained for 30 years to be able to understand statistics and experimental design and interpretation...I'll decide based on the quality of the data and the experimental design....that's what we do."</i><br /><br />Besides taking on the criticisms of science, this emphasized two important points.<br /><i> </i><br /><b>1)</b> <b>Graduate study teaches you to understand statistics and experimental design and interpretation. </b>At the undergraduate level I learned some basics that were quite useful in terms of empirical work. In graduate school I learned what is analogous to a new language. The additional properties of estimators, proofs, and theorems taught in graduate statistics courses suddenly made the things I learned before make better sense. This background helped me to translate and interpret other people's work and learn from it, and learn new methodologies or extend others. But it was the seminars and applied research that made it come to life. Learning to 'do science' through statistics and experimental design. And interpretation as Kevin says. <br /><br /><b>2) Graduate study is an extendable framework.</b> Learning and doing statistics is a career long process. <a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html">This recognizes the gulf between textbook and applied econometrics.</a><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-47997498235384052352017-06-11T21:17:00.001-04:002017-07-12T20:11:39.604-04:00Instrumental Variables vs. Intent to Treat<i> "ITT analysis includes every subject who is randomized according to randomized treatment assignment. It ignores noncompliance, protocol deviations, withdrawal, and anything that happens after randomization. ITT analysis is usually described as “once randomized, always analyzed”.<br /><br />"ITT analysis avoids overoptimistic estimates of the efficacy of an intervention resulting from the removal of non-compliers by accepting that noncompliance and protocol deviations are likely to occur in actual clinical practice" </i> - Gupta, 2011<br /><br /> In Mastering Metrics, Angrist and Pischke describe intent-to-treat analysis: <br /><br /><i>"In randomized trials with imperfect compliance, when treatment assignment differs from treatment delivered, effects of random assignment...are called intention-to-treat (ITT) effects. An ITT analysis captures the causal effect of being assigned to treatment."</i><br /><br />While treatment assignment is random, non-compliance is not! Therefore if instead of using intent to treat comparisons we compared those actually treated to those untreated we would get biased results, because this is essentially making uncontrolled comparisons between treated and untreated subjects. <br /><br />Angrist and Pishke describe how instrumental variables can be used in this context:<br /><br /> <i>“The instrumental variables (IV) method harnesses partial or incomplete random assignment, whether naturally occurring or generated by researchers"</i><br /><br /><i> "Instrumental variable methods allow us to capture the causal effect of treatment on the treated in spite of the nonrandom compliance decisions made by participants in experiments....Use of randomly assigned intent to treat as an instrumental variable for treatment delivered eliminates this source of selection bias."</i><br /><br />In <i>Intent-to-Treat vs. Non-Intent-to-Treat Analyses under Treatment Non-Adherence in Mental Health Randomized Trials </i>there is a nice discussion of ITT and IV methods with applications related to clinical research. Below is a nice treatment of IV in this context:<br /><br /><i>“Instrumental variables are assumed to emulate randomization variables, unrelated to unmeasured confounders influencing the outcome. In the case of randomized trials, the same randomized treatment assignment variable used in defining treatment groups in the ITT analysis is instead used as the instrumental variable in IV analyses. In particular, the instrumental variable is used to obtain for each patient a predicted probability of receiving the experimental treatment. Under the assumptions of the IV approach, these predicted probabilities of receipt of treatment are unrelated to unmeasured confounders in contrast to the vulnerability of the actually observed receipt of treatment to hidden bias. Therefore, these predicted treatment probabilities replace the observed receipt of treatment or treatment adherence in the AT model to yield an estimate of the as-received treatment effect protected against hidden bias when all of the IV assumptions hold.”</i><br /><br />A great example of IV and ITT applied to health care can be found in Finkelstein et. al. (2013 & 2014) - See t<a href="http://econometricsense.blogspot.com/2014/01/the-oregon-medicaid-experiment-applied.html">he Oregon Medicaid Experiment, Applied Econometics, and Causal Inference.</a><br /><br />Over at the <a href="http://theincidentaleconomist.com/wordpress/methods-intention-to-treat/">Incidental Economist, there was a nice discussion</a> of ITT in the context of medical research that does a good job of explaining the rationale as well as when departures from ITT make more sense (such as safety and non-inferiority trials). <br /><br /><b>See also: </b><a href="http://econometricsense.blogspot.com/2015/11/instrumental-explanations-of.html"><br /></a><a href="http://econometricsense.blogspot.com/2015/11/instrumental-explanations-of.html">Instrumental Explanations of Instrumental Variables</a><br /><br /><a href="http://econometricsense.blogspot.com/2013/06/an-toy-instrumental-variable-application.html">A Toy IV Application</a><br /><br /><a href="http://econometricsense.blogspot.com/search/label/instrumental%20variables">Other IV Related Posts </a><br /><br /><b>References: </b><br /><br />Mastering ’Metrics:<br />The Path from Cause to Effect<br />Joshua D. Angrist & Jörn-Steffen Pischke<br />2015<br /><br />Gupta, S. K. (2011). Intention-to-treat concept: A review. Perspectives in Clinical Research, 2(3), 109–112. http://doi.org/10.4103/2229-3485.83221<br /><br />Ten Have, T. R., Normand, S.-L. T., Marcus, S. M., Brown, C. H., Lavori, P., & Duan, N. (2008). Intent-to-Treat vs. Non-Intent-to-Treat Analyses under Treatment Non-Adherence in Mental Health Randomized Trials. Psychiatric Annals, 38(12), 772–783. http://doi.org/10.3928/00485713-20081201-10<br /><br /><span class="userContent">"The Oregon Experiment--Effects of Medicaid on Clinical Outcomes," by Katherine Baicker, et al. New England Journal of Medicine, 2013; 368:1713-1722. http://www.nejm.org/doi/full/10.1056/NEJMsa1212321 </span><br /><span class="userContent"><br /></span><span class="userContent">Medicaid Increases Emergency-Department Use: Evidence from Oregon's Health Insurance Experiment. Sarah L. Taubman,Heidi L. Allen, Bill J. Wright, Katherine Baicker, and Amy N. Finkelstein. Science 1246183Published online 2 January 2014 [DOI:10.1126/science.1246183] </span><br /><br /><span class="userContent">Detry MA, Lewis RJ. The Intention-to-Treat Principle<span class="subtitle">How to Assess the True Effect of Choosing a Medical Treatment</span>. <i>JAMA.</i> 2014;312(1):85-86. doi:10.1001/jama.2014.7523 </span><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-87241060599608890252017-06-06T20:36:00.001-04:002018-12-21T12:30:29.916-05:00Professional Science Master's Degree Programs in Biotechnology and ManagementAs an undergraduate I always had an interest in biotechnology and molecular genetics. However, lab work did not particularly appeal to me. I also recognized early on that science does not occur in a vacuum- its subject to social, political, economic, and financial forces. This drew me to the field of economics, specifically public choice theory.<br /><br />When it came time for graduate school I was still torn. I really wasn't interested in an MBA and didn't really have the background to work in a lab or do field work in genetic research. I really liked economics. The combination of mathematically precise theories (microeconomics/game theory) and empirically sound methods (econometrics) provided a powerful framework for applied problem solving.<br /><br />I had two advisers make recommendations that got me thinking outside the box. One suggested ultimately I would find a niche that combined both economics and genetics. The other suggested I look at programs like the Bioscience Management program that was being offered at the time at George Mason University (now Bioinformatics Management). While there were not a lot of programs like that being offered at the time, the Agriculture Department at Western Kentucky University provided enough flexibility in their masters program to include courses in<span class="background-details"> biostatistics, genetics, and applied economics. I was able to work on research projects analyzing consumer perceptions of biotechnology and biotech trait resistance management using tools from econometrics, game theory, and population genetics. Additionally I took courses in applied economics and finance from both the Department of Agriculture and College of Business where I was exposed to tools related to investment analysis, options pricing, and analysis and valuation of biotech companies as well as the impacts of technological change and biotechnology on food and economic development.</span><br /><br /><span class="background-details">With this combination of quantitative training and applied work I have been able to leverage SAS, R, and Python to solve a number of challenging problems throughout a number of professional analytics and consulting roles. </span><br /><span class="background-details"><br /></span><span class="background-details">Today there are a larger number of professional science masters programs similar to the programs I contemplated over 10 years ago. </span><br /><span class="background-details"><br /></span><span class="background-details">According to <a href="https://www.professionalsciencemasters.org/about">National Professional Science Master’s Association</a>:</span><br /><span class="background-details"><br /></span><i><span class="background-details">"Professional Science Master's (PSMs) are designed for students who are seeking a graduate degree in science or mathematics and understand the need for developing workplace skills valued by top employers. A perfect fit for professionals because it allows you to pursue advanced training and excel in science or math without a Ph.D., while simultaneously developing highly-valued business skills....</span></i><i><span class="background-details">PSM programs consist of two years of coursework along with a professional component that includes business, communications and/or regulatory affairs."</span></i><br /><br /><span class="background-details">In 2012 there was an <a href="http://www.sciencemag.org/careers/2012/03/does-professional-science-masters-degree-pay">article in Science </a>detailing these degrees and some data related to salaries which seemed attractive. According to the article the first program was officially offered in 1997, reaching 140 programs by 2009 with over 247 at the time of printing.</span><br /><span class="background-details"><br /></span><span class="background-details">This commentary from the article corroborates how I feel about my experience:</span><br /><span class="background-details"><br /></span><i><span class="background-details">“There is a tendency for students to buy into the line that if you don't get a Ph.D., you're not a serious professional, that you're wasting your mind,” she says. After spending a decade talking with PSM students and graduates, she is certain that’s not true. “There is so much potential for growth and satisfaction with a PSM degree. You can become a person you didn’t even know you wanted to be.”</span></i><br /><span class="background-details"><br /></span><span class="background-details">Below are some programs that would look interesting to me that students interested in this option should check out. (t<a href="https://www.professionalsciencemasters.org/program-locator">here is a program locator you can find here)</a> . Similar to my master's, many of these programs are a mash up of biology/biotech and applied economics and business degrees. </span><br /><span class="background-details"><br /></span><span class="background-details">George Mason University- <a href="http://ssb.gmu.edu/academics/Professional-Science-Masters-in-Bioinformatics-Management.cfm">PSM Bioinformatics Management</a></span><br /><br /><span class="background-details">University of Illinois - <a href="http://psm.illinois.edu/agricultural-production">Agricultural Production </a></span><br /><span class="background-details"><br /></span><span class="background-details">Cornell- <a href="https://dyson.cornell.edu/programs/graduate/mps.html">MPS Agriculture and Life Sciences </a></span><br /><br /><span class="background-details">Washington State University - <a href="https://online.wsu.edu/grad/professionalScience.aspx">PSM Molecular Biosciences</a></span><br /><span class="background-details"><br /></span><span class="background-details">Middle Tennesee State University - <a href="http://www.mtsu.edu/programs/biotechnology-ms/">PSM Biotechnology</a></span><br /><span class="background-details"><br /></span><span class="background-details">California State - <a href="http://ext.csuci.edu/programs/ms-biotech-mba-dual-degree/index.htm">MS Biotechnology/MBA </a></span><br /><br /><span class="background-details">Johns Hopkins - <a href="http://advanced.jhu.edu/academics/dual-degree-programs/biotechnology-mba/">MBA/MS Biotechnology</a></span><br /><span class="background-details"><br /></span><span class="background-details">Rice - <a href="https://profms.rice.edu/bioscience-health-policy/overview">PSM Bioscience and Health Policy</a></span><br /><span class="background-details"><br /></span><span class="background-details">North Carolina State University - <a href="https://mba.ncsu.edu/academics/concentrations/biosciences-management/">MBA (Biosciences Mgt Concentration)</a></span><br /><span class="background-details"></span><br /><span class="background-details">Purdue/Kelley - <a href="http://agribusiness.purdue.edu/ms-mba-plan-of-study">MS-MBA</a> (not a heavy science emphasis but a very cool degree regardles from great schools)</span><br /><br /><b><span class="background-details">See also: </span></b><br /><a href="http://econometricsense.blogspot.com/2015/07/analytical-translators.html"><span class="background-details">Analytical Translators</span></a><br /><span class="background-details"><a href="http://econometricsense.blogspot.com/2010/09/why-study-appliedagricultural-economics.html">Why Study Agricultural/Applied Economics</a></span>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-52791475258853448632017-06-05T20:40:00.003-04:002017-06-05T20:45:56.919-04:00Game Theory with Python- TalkPython PodcastEpisode 104 of the TalkPython podcast discussed game theory.<br /><iframe frameborder="no" height="166" scrolling="no" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/314210830&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false" width="100%"></iframe> <br />Here are a few slices:<br /><br /><i>"Our guests this week, Vince Knight, Marc Harper, and Owen Campbell are here to discuss their Python project built to study and simulate one of the central problems in game theory, "The Prisoner's Dilemma" </i><br /><br /><i>"Yeah, so one of the things is how people end up cooperating. If we're all incentivized not to cooperate with each other yet we look around, we see all these situations where people are cooperating, so can we devise strategies that when we play this game repeatedly that coerce or convince our partners that they're better off cooperating with us than defecting against us......Okay, excellent. Give us a sense for some of the, you have some clever names for the different strategies or players, right? Strategy and player is kind of the same thing. You've got the basic ones. The cooperator and the defector, but what else?Probably the most famous one is the tit for tat strategy. Because in Axelrod's original tournament, one of the interesting results that came out with his work was that this strategy was one of the most successful."</i><br /><br />And then they get into incorporating machine learning:<br /><br /><i>"We've extended that method of taking a strategy based on some kind of machine learning algorithm, training it against the other strategies and then adding the fact of the tournaments to see about those. Right now, those are amongst the best players in the library, in terms of performance."</i><br /><br />See my <a href="http://econometricsense.blogspot.com/2017/06/game-theory-basic-introduction.html">previous post</a> for some concepts and examples from game theory that were discussed in this podcast. You can find more references from this podcast including papers, code etc. <a href="https://talkpython.fm/episodes/show/104/game-theory-in-python">here.</a><br /> Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-87906637884622997612017-06-05T20:26:00.001-04:002017-06-05T21:25:06.637-04:00Game Theory- A Basic Introduction<div class="page" title="Page 2"><div class="layoutArea"><div class="column"><span style="font-family: "times new roman"; font-size: 12.000000pt;">When someone else’s choi</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">ces impact you, it helps to have some way to anticipate their behavior. Game Theory provides the tools for doing so (Nicholson, 2002). Game Theory is a mathematical technique developed to study choice under conditions of strategic interaction (Zupan, 1998). It allows for the analysis of interdependent situations. </span><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"></span><span style="font-family: "times new roman"; font-size: 12.000000pt;"><br /></span> <span style="font-family: "times new roman"; font-size: 12.000000pt;">In game theory, a </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">game </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">is a decision-making situation with interdependent behavior between two or more individuals (Harris,1999). The individuals involved in making the decisions are the </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">players</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">. The set of possible choices made by the players are </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">strategies</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">. The outcomes of choices and strategies played are </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">payoffs</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">. Payoffs are often stated as levels of utility, income, profits, or some other stated objective particular to the game. A general assumption in game theory is that players seek the highest payoff attainable, preferring more utility to less (Nicholson, 2002). </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">When a decision maker takes into account how other players will respond to his choices, a utility maximizing strategy may be found. It may allow one to predict in advance the actions, responses, and counter responses of others and then choose optimal strategies (Harris, 1999). Such optimal strategies that leave players with no incentive to change their behavior are </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">equilibrium strategies</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">. </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">Games can be characterized by players, strategies, and payoffs. Below is one way to visualize a game. </span></div></div></div><br />Example: Overgrazing Game<br /><br /> RANCHER 2:<br /> Conserve Overgraze<br />RANCHER 1: Conserve (20, 20) | (0, 30)<br /> Overgraze (30, 0) | (10, 10)<br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">In this game, the players are rancher '1' and rancher '2'. They can play one of two strategies, to conserve or overgraze a commonly shared or 'public' pasture. Suppose rancher 1 chooses a strategy (picks a row). Their payoff is depicted by the </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">first number in each cell. Rancher 2 will choose a strategy in return (picking a column). Rancher 2’s </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">payoff is indicated by the second number in each cell. </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">In this case, the best strategy for rancher 2 (no matter what rancher 1 chooses to do) is to overgraze because the payoff for rancher 2 (the 2nd number in each cell) associated with overgrazing is always the highest. Likewise, no matter what rancher 2 chooses to do, the best strategy for rancher 1 is to overgraze because the first number in each cell (the payoffs for rancher 1) associated with overgrazing is always the highest. Both players have a dominant strategy to overgraze This represents an equilibrium strategy of {overgraze, overgraze}. </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">This outcome is also described as a prisoner’s dilemma or a </span><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">Nash Equilibrium. </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">In a Nash </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">equilibrium each player’s choice is the best choice possible taking into consideration the choice </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">of the other players </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">(Zupan, 1998)</span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">. </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">This concept was generalized by the mathematician John </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Nash in 1951 in his paper “Equilibrium Points in n</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">-</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Person Games.” </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">It’s easy to see that if the players would conserve</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">, they could both be made better off because the strategy {conserve, conserve} yields payoffs (20,20) which are much higher than the Nash </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Equilibrium strategy’s payoff of (10,10). </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">Just as competitive market forces elicit cooperation by coordinating behavior through price mechanisms, so too must players in a game find some means of coordinating their behavior if they wish to escape the sub-optimal Nash Equilibrium. <b> </b></span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"><b>Some Additional Concepts</b> </span><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u> </u></span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u>Multiple Period Games-</u> </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Multiple period games are games that are played more than once, or more than one time period. If we could imagine playing the pr</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">isoner’s dilemma game multiple times we would have a multi</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">- period game. If games are played perpetually they are referred to </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">infinite games</span><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">(Harris, 1999). </span><u><span style="font-family: "times new roman"; font-size: 12.000000pt;"> </span></u><br /><br /><u><span style="font-family: "times new roman"; font-size: 12.000000pt;">Punishment Schemes </span></u><b><span style="font-family: "times new roman"; font-size: 12.000000pt;">-</span></b><span style="font-family: "times new roman"; font-size: 12.000000pt;"> </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">Punishment schemes are used to elicit cooperation or enforcement of agreements. </span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;">In the game presented above, suppose both players wanted to cooperate to conserve grazing resources. If it turned out that rancher 2 cheated, then in the next period rancher 1 would refuse to cooperate. If the game is played repeatedly, rancher 2 would learn that if he sticks to the deal both players would be better off. In this way punishment schemes in multi-period games can elicit cooperation, allowing an escape from a Nash Equilibrium. This may not be possible in the single period games that we looked at before.</span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u>Tit-for-Tat </u>- </span><span style="font-family: "times new roman"; font-size: 12.000000pt;"><span style="font-family: "times new roman,bold"; font-size: 12.000000pt;">Tit-for-tat </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">punishment mechanisms are schemes in which if one player fails to cooperate, the other player will refuse to cooperate in the next period. </span> </span><br /><br /><u><span style="font-family: "times new roman"; font-size: 12.000000pt;">Trigger Strategy</span></u><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u> </u>- In </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">infinitely </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">repeated games a trigger strategy involves a promise to play the optimal strategy as long as the other players comply (Nicholson, 2002). </span><u><span style="font-family: "times new roman"; font-size: 12.000000pt;"> </span></u><br /><br /><u><span style="font-family: "times new roman"; font-size: 12.000000pt;">Grim Trigger Strategy</span></u><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u> </u>- This is a trigger strategy that involves punishment for many periods if the other player does not cooperate. In other words if one player defects when he should cooperate, the other player(s) will not offer the chance to cooperate again for a long time. As a result both players will be confined to a N.E. for many periods or perpetually (Harris, 1999). <u> </u></span><br /><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"><u>Trembling Hand Trigger Strategy-</u> This is a trigger strategy that allows for mistakes. Suppose in the first instance player 1 does not </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">realize that player 2 is willing to cooperate. Instead of player 1 resorting to a long period of punishment as in the </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">grim trigger strategy</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">, player 1 allows player 2 a second chance to cooperate. It may be the case that instead of playing the </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">grim trigger strategy</span><span style="font-family: "times new roman"; font-size: 12.000000pt;">, player 1 may invoke a single period </span><span style="font-family: "times new roman,italic"; font-size: 12.000000pt;">tit-for-tat </span><span style="font-family: "times new roman"; font-size: 12.000000pt;">punishment scheme in hopes to elicit cooperation in later periods. </span><br /><br /><u><span style="font-family: "times new roman"; font-size: 12.000000pt;">Folk Theorems</span></u><span style="font-family: "times new roman"; font-size: 12.000000pt;"> - Folk theorems result from the conclusion that players can escape the outcome of a Nash Equilibrium if games are played repeatedly, or are infinite period games (Nicholson,2002).</span><br /><span style="font-family: "times new roman"; font-size: 12.000000pt;"> In general, folk theorems state that players will find it in their best interest to maintain trigger strategies in infinitely repeated games.</span><b><span style="font-family: "times new roman"; font-size: 12.000000pt;"> </span></b><br /><br /><b><span style="font-family: "times new roman"; font-size: 12.000000pt;">See also:</span></b><br />Matt Bogard. "An Econometric and Game Theoretic Analysis of Producer and Consumer Preferences Toward Agricultural Biotechnology" <i>Western Kentucky University</i> (2005) Available at: <a href="http://works.bepress.com/matt_bogard/31/">http://works.bepress.com/matt_bogard/31/</a><br /><br />Matt Bogard. "An Introduction to Game Theory: Applications in Environmental Economics and Public Choice with Mathematical Appendix" (2012) Available at: <a href="http://works.bepress.com/matt_bogard/22/">http://works.bepress.com/matt_bogard/22/ </a><br /><br />Matt Bogard. "Game Theory, A Foundation for Agricultural Economics" (2004) Available at: <a href="http://works.bepress.com/matt_bogard/32/">http://works.bepress.com/matt_bogard/32/</a><b> </b><br /><br /><b>References:</b><br /><br />Nicholson, Walter R. “Microeconomic Theory: Basic Principles and Extensions.” Southwestern Thomson Learning. U.S.A. (2002).<br /><br />Browning, Edward K. and Mark A. Zupan. “Microeconomic Theory and Applications.” 6th Edition. Addison-Wesley Longman Inc. Reading, MA. (1999)<br /><br />Harris, Frederick H. et al. “Managerial Economics: Applications, Strategy, and Tactics.” Southwestern College Publishing. Cincinnati, OH. (1999).<b><br /> </b>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-68759999528322101602017-06-03T16:04:00.000-04:002017-06-03T16:27:17.521-04:00In Praise of The Citizen Data ScientistThere was actually a really good article I read over at Data Science Central titled <a href="http://www.datasciencecentral.com/profiles/blogs/the-data-science-delusion">"The Data Science Delusion."</a> Here is an interesting slice:<br /><i><br /></i><i>"This democratization of algorithms and platforms, paradoxically, has a downside: the signaling properties of such skills have more or less been lost. Where earlier you needed to read and understand a technical paper or a book to implement a model, now you can just use an off-the-shelf model as a black-box. While this phenomenon affects many disciplines, the vague and multidisciplinary definition of data science certainly exacerbates the problem."</i><br /><br />It is true there is some loss of signal. However, companies may need to look for new signals as technological change progresses and new forms of capital complements labor.<i> </i>Its this new labor complementing role of capital (in the form of open source statistical computing packages and computing power) that is creating demand for those that can leverage these tools competently, without knowing all <i> <a href="http://econometricsense.blogspot.com/2017/04/what-do-you-really-need-to-know-to-be.html">"the nitty-gritty mathematical academic formulas to everything about support vector machines or Kernels and stuff like that to apply it properly and get results."</a></i><br /><br />Sure, as a result there are a lot of analytics programs popping up out there to take advantage of these advances, but its also the reason programs like applied economics are becoming so popular. In fact, in promoting its program, Johns Hopkins University almost seems to echo some of the sentiment in the quotes above, but takes a positive spin:<br /><br /><i>"Economic analysis is no longer relegated to academicians and a small number of PhD-trained specialists. Instead, economics has become an increasingly ubiquitous as well as rapidly changing line of inquiry that requires people who are skilled in analyzing and interpreting economic data, and then using it to effect decisions about national and global markets and policy, involving everything from health care to fiscal policy, from foreign aid to the environment, and from financial risk to real risk." </i><br /><br />In fact, I admit for a while I was a little disappointed my alma mater did not embrace the data science/analytics degree trend, or offer more courses in applied programming or incorporate languages like R into more courses. However, now, while I think these things are great I realize the more important data science skills are related to the analytical thinking and firm theoretical, statistical, and quantitative foundations that programs in economics and finance already offer at the undergraduate and masters level. While formal data science training might be the way of the future, I would venture to say that the vast majority of today's 'data scientists' were academically trained in a quantitative discipline like the above and self trained (perhaps via coursera etc.) on the skills and tools most people think of when they think of data science. As I have said before, sometimes you don't need someone with a PhD in computer science or an astrophysics. <a href="http://econometricsense.blogspot.com/2016/10/the-future-data-scientist.html">Sometimes you really just need a good MBA that understands regression and the basics of a left join.</a><br /><br />The DSC article above concludes with a little jab at data science, that I tend to agree with wholeheartedly:<br /><br /><i>"Great data science work is being done in various places by people who go by other names (analyst, software engineer, product head, or just plain old scientist). It is not necessary to be a card-carrying data scientist to do good data science work. Blasphemy it may be to say so, but only time will tell whether the label itself has value, or is only helping create a delusion." </i><br /><br /><b>See also:</b><br /><br /><a href="http://econometricsense.blogspot.com/2017/04/what-do-you-really-need-to-know-to-be.html">What you really need to know to be a data scientist</a><br /><a href="http://econometricsense.blogspot.com/2017/04/super-data-science-podcast-credit.html">Super Data Science podcast - credit scoring</a><br /><a href="http://www.kdnuggets.com/2017/03/think-like-data-scientist-become-one.html">How to think like a data scientist to become one</a><br /><a href="http://www.kdnuggets.com/2017/03/what-makes-great-data-scientist.html">What makes a great data scientist </a><br /><a href="http://econometricsense.blogspot.com/2016/10/the-future-data-scientist.html">Are data scientists going extinct</a><br /><a href="http://econometricsense.blogspot.com/2017/04/more-on-data-science-from-actual-data.html">More on data science from actual data scientists </a><br /><br /><i><br /></i><i><br /></i>Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-49943560241653551612017-05-30T17:57:00.000-04:002017-05-31T12:15:31.169-04:00Multicollinearity.....just a bad joke?<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-VbgjaCABEak/WS3NHzb-UbI/AAAAAAAACUw/nOrAfc7cErgWFA-0FF3t4lamThznBr-TgCLcB/s1600/tumblr_n6i5qv3HcT1rzm4u4o1_1280.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="778" data-original-width="1200" height="207" src="https://4.bp.blogspot.com/-VbgjaCABEak/WS3NHzb-UbI/AAAAAAAACUw/nOrAfc7cErgWFA-0FF3t4lamThznBr-TgCLcB/s320/tumblr_n6i5qv3HcT1rzm4u4o1_1280.jpeg" width="320" /></a></div>Link/Credit: <a href="https://www.pinterest.com/pin/96686723226973447/">https://www.pinterest.com/pin/96686723226973447/ </a> <br /><br /><i>"The worth of an econometrics textbook tends to be inversely related to the technical material devoted to multicollinearity" </i>- Williams, R. Economic Record 68, 80-1. (1992). via Kennedy, A Guide to Econometrics (6th edition).<br /><br /><br />If you have never read Arthur S. Goldberger's treatment of multicollinearity in his well known text <i>A Course in Econometrics</i> you are missing some of the best reading in econometrics you will ever find. A few years ago Dave Giles gave a nice preview here: <a href="http://davegiles.blogspot.com/2011/09/micronumerosity.html">http://davegiles.blogspot.com/2011/09/micronumerosity.html</a><br /><br />Basically, Goldberger provides a good length discussion in his textbook about 'micronumerosity,' a term he makes up to parody multicollinearity and the excessive amount of attention it is given in textbooks and resources spent by practitioners attempting to 'detect' it (see Dave Giles post). Its more entertaining than the meme I found above.<br /><br />For a quick review, multicollinearity can be characterized in multivariable regression as a situation where there is correlation between explanatory variables. For instance if we are estimating:<br /><br /> y = b0 + b1x1 + b2x2 + b3x3 + e<br /><br />and x2 and x3 are highly correlated, the amount of independent variation in each variable is reduced. With less information available to estimate the effects b2 and b3, these estimates become less precise and their standard errors may be larger than otherwise.<br /><br />As Goldberger advises, we should not spend a lot of resources trying to apply various 'tests' for multicollinearity, but focus more on if its consequences really matter:<br /><br /><i>"Researchers should not be concerned with whether or not there really is collinearity. They may well be concerned with whether the variances of the coefficient estimates are too large-for whatever reason-to provide useful estimates of the regression coefficients" </i>(Goldberger, 1991).<br /><br />Below are some other posts I have previously written on the topic, addressing multicollinearity in the context of predictive vs inferential modeling etc.<br /><br />From my discussion of multicollinearity in <a href="http://econometricsense.blogspot.com/2015/06/linear-literalism-fundamentalist.html">Linear Literalism and Fundamentalist Econometrics</a>: <br /><br /><i>"Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model."-</i> Statist. Sci. Volume 25, Number 3 (2010), 289-310.<br /><br />See also: <br /><br /><a href="http://econometricsense.blogspot.com/2013/01/paul-allison-on-multicollinearity.html">Paul Allison on Multicollinearity - when not to worry</a><br /><br /><a href="http://econometricsense.blogspot.com/2011/01/ridge-regression.html">Ridge Regression</a><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-17935369217078563682017-04-10T05:40:00.000-04:002017-04-10T05:40:11.915-04:00More on Data Science from Actual Data ScientistsPreviously I wrote a post titled: <a href="http://econometricsense.blogspot.com/2017/04/what-do-you-really-need-to-know-to-be.html">What do you really need to know to be a data scientist. Data science lovers and haters.</a> In this post I made the general argument that this is a broad space and there is a lot of contention about the level of technical skill and tools that one must master to consider themselves a 'real' data scientist vs. getting labeled a 'fake' data scientist or 'poser' or whatever. But, to me its all about leveraging data to solve problems and most of that work is about cleaning and prepping data. It's process. In an older KDNuggets article, economist/data scientist <a href="http://www.kdnuggets.com/2012/08/exclusive-scott-nicholson-interview-economics-weather-linkedin-healthcare.html">Scott Nicholson makes a similar point:</a><br /><br /><i>GP: What advice you have for aspiring data scientists?</i><br /><i><br /></i><i>SN: Focus less on algorithms and fancy technology & more on identifying questions, and extracting/cleaning/verifying data. People often ask me how to get started, and I usually recommend that they start with a question and follow through with the end-to-end process before they think about implementing state-of-the-art technology or algorithms. Grab some data, clean it, visualize it, and run a regression or some k-means before you do anything else. That basic set of skills surprisingly is something that a lot of people are just not good at but it is crucial.</i><br /><i><br /></i><i>GP: Your opinion on the hype around Big Data - how much is real?</i><br /><i><br /></i><i>SN: Overhyped. Big data is more of a sudden realization of all of the things that we can do with the data than it is about the data themselves. Of course also it is true that there is just more data accessible for analysis and that then starts a powerful and virtuous spiral. For most companies more data is a curse as they can barely figure out what to do with what they had in 2005.</i><br /><i><br /></i>So getting your foot in the door in a data science field doesn't mean mastering Hive or Hadoop apparently. And, this does not sound like PhD level rocket science at this point either. Karolis Urbonas, Head of Business Intelligence at Amazon has recently written a couple of similarly themed pieces also at KDNuggets:<br /><br /><a href="http://www.kdnuggets.com/2017/03/think-like-data-scientist-become-one.html">How to think like a data scientist to become one</a><br /><br /><i>"I still think there’s too much chaos around the craft and much less clarity, especially for people thinking of switching careers. Don’t get me wrong – there are a lot of very complex branches of data science – like AI, robotics, computer vision, voice recognition etc. – which require very deep technical and mathematical expertise, and potentially a PhD… or two. But if you are interested in getting into a data science role that was called a business / data analyst just a few years ago – here are the four rules that have helped me get into and are still helping me survive in the data science."</i><br /><br />He emphasizes basic data analysis, statistics, and coding to get started. The emphasis again is not on specific tools, degrees etc. but more on the process and ability to use data to solve problems. Note in the comments there is some push back on the level of expertise required, but Karolis actually addressed that when he mentioned very narrow and specific roles in AI, robotics, etc. Here he's giving advice for getting started in the broad diversity of roles in data science outside these narrow tracks. The issue is some people in data science want to narrow the scope to the exclusion of much of the work done by business analysts, researchers, engineers and consultants creating much of the value in this space (<a href="http://econometricsense.blogspot.com/2017/04/what-do-you-really-need-to-know-to-be.html">again see my previous post</a>).<br /><br /><a href="http://www.kdnuggets.com/2017/03/what-makes-great-data-scientist.html">What makes a great data scientist?</a><br /><br /><i>"A data scientist is an umbrella term that describes people whose main responsibility is leveraging data to help other people (or machines) making more informed decisions….Over the years that I have worked with data and analytics I have found that this has almost nothing to do with technical skills. Yes, you read it right. Technical knowledge is a must-have if you want to get hired but that’s just the basic absolutely minimal requirement. The features that make one a great data scientist are mostly non-technical."</i><br /><b><i><br /></i></b><b><i>1. Great data scientist is obsessed with solving problems, not new tools.</i></b><br /><b><i><br /></i></b><i>"This one is so fundamental, it is hard to believe it’s so simple. Every occupation has this curse – people tend to focus on tools, processes or – more generally – emphasize the form over the content. A very good example is the on-going discussion whether R or Python is better for data science and which one will win the beauty contest. Or another one – frequentist vs. Bayesian statistics and why one will become obsolete. Or my favorite – SQL is dead, all data will be stored on NoSQL databases."</i><br /><br /><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-75189429574552343882017-04-08T12:16:00.002-04:002017-04-09T10:12:53.380-04:00What do you really need to know to be a data scientist? Data Science Lovers and Haters<a href="http://econometricsense.blogspot.com/2017/04/super-data-science-podcast-credit.html">Previously I discussed the Super Data Science podcast and credit modeling </a>in terms of the modeling strategy and models used. The discussion also covered data science in general, and one part of the conversation I thought was well worth discussing in more detail. It really gets to the question of what's it take to be a data scientist. There is a ton of energy spent on this in places like LinkedIn and other forums. I think the answer comes in two forms. From the 'lovers' of data science its all about what kind of advice can I give people to help and encourage them to create value in this space. To the 'haters' its more like now that I have established myself in this space what kind of criterion should we have to keep people out and prevent them from creating value. But before we get to that, here is some great dialogue from Kirill discussing a trap that data scientists or aspiring data scientists fall into:<br /><br />Kirill: <i>"I think there’s a level of acumen that people should have, especially going into data science role. And then if you’re a manager you might take a step back from that. You might not need that much detail…If you’re doing the algorithms, that acumen might be enough. You don’t need to know the nitty-gritty mathematical academic formulas to everything about support vector machines or Kernels and stuff like that to apply it properly and get results. On the other hand, if you find that you do need that stuff you can go and spend some additional time learning. A lot of people fall into the trap. They try to learn everything in a lot of depth, whereas I think the space of data science is so broad you can’t just learn everything in huge depths. It’s better to learn everything to an acceptable level of acumen and then deepen your knowledge in the spaces that you need."</i><br /><br />Greg: <i>"if you don’t want to get into that detail, I totally get it. You can be totally fine without it. I have never once in my career had somebody ask me what are the formulas behind the algorithm….there’s a lot of jobs out there for people that don’t know them."</i><br /><br />I admit I used to fall into this trap. In fact this blog is a direct result. Early in my career I had the mindset if you can't prove it you can't use it. I really didn't feel confident about an algorithm or method until I understood it 'on paper' and could at least code my own version in SAS IML or R. A number of posts here were based on this work and mindset. Then, a very well known and accomplished developer/computational scientist that frequently helped me gave the good advice that with this mindset you might never get any work done. Or only a fraction of work.<br /><br />Given the amount of discussion you might see on LinkedIn or the so called data science community about real or fake data scientists (lots of haters out there) in the <a href="https://talkpython.fm/episodes/show/56/data-science-from-scratch">Talk Python to Me podcast</a> author Joel Grus (of <a href="http://shop.oreilly.com/product/0636920033400.do">Data Science from Scratch</a>) provides what I think is the most honest discussion of what data science is and what data scientists do:<br /><br /><i>"there are just as many jobs called data science as there are data scientists"</i><br /><br />That is kind of paraphrasing and kind of out of context and yes very general. Almost defining a word using the word in the definition. But it is very very TRUE. That is because the field is largely undefined. To attempt to define it is futile and I think would be the antithesis of data science itself. I will warn though that there are plenty of data science haters out there that would quibble with what Greg and Joel have said above.<br /><br />These are people that want to impose something more strict. Some minimum threshold. Common threads indicate some fear of a poser or fake data scientist fooling some company into hiring them or incompetently pointing and clicking their way through an analysis without knowing what is going on and calling themselves a data scientist. While I understand that concern, its one extreme. It can easily morph into a straw man argument for a more political agenda at the other extreme. That might lead to a listing of minimal requirements to be a <i>real </i>data scientist, some laundry list of requirements (think big data technologies, degrees and the like). Economists know all about this and we see it in the form of licensing and rent seeking in a number of professions and industries. Broadly speaking its a waste of resources. Absolutely in this broad space economists would also recognize merit in signaling through certification, certain degree programs or course work, or other methods of credentialization. But there is a big difference between competitive signaling and non-competitive rent seeking behaviors.<br /><br />In its inception, data science was all about disruption. As described in Johns Hopkins applied economics program description:<br /><br /><i>“Economic analysis is no longer relegated to academicians and a small number of PhD-trained specialists. Instead, economics has become an increasingly ubiquitous as well as rapidly changing line of inquiry that requires people who are skilled in analyzing and interpreting economic data, and then using it to effect decisions ………Advances in computing and the greater availability of timely data through theInternet have created an arena which demands skilled statistical analysis, guided by economic reasoning and modeling.”</i><br /><br />This parallels data science. Suddenly you no longer need a PhD in statistics or a software engineering background or an academics' level of acumen to create value added analysis. (although those are all excellent backgrounds for doing some advanced work in data science no doubt). Its that <a href="http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram">basic combination of subject matter expertise, some knowledge of statistics and machine learning, and ability to write code or use software to solve problems.</a> That's it. Its disruptive and the haters hate it. They simultaneously embrace the disruption and want to reign it in and fence out the competition. I hate it for the haters but you don't need to be able to code your own estimators or train a neural net from scratch to use it. And there is probably as much or more value creating professional space out there for someone that can clean a data set and provide a set of cross tabs as there is for the know how to set up a Hadoop cluster.<br /><br />Below are a couple of really great KDNuggets articles in this regard written by Karolis Urbonas, Head of Business Intelligence at Amazon:<br /><br /><a href="http://www.kdnuggets.com/2017/03/think-like-data-scientist-become-one.html">How to think like a data scientist to become one</a><br /><br /><a href="http://www.kdnuggets.com/2017/03/what-makes-great-data-scientist.html">What makes a great data scientist?</a><br /><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0tag:blogger.com,1999:blog-2474498300859593807.post-24278960762776002862017-04-08T09:16:00.000-04:002017-04-08T18:09:26.938-04:00Super Data Science Podcast Credit Scoring ModelsI recently discovered the Super Data Science podcast hosted by Kirill Eremenko. What I like about this podcast series is that it is applied data science. You can talk all day about theory, theorems, proofs, and mathematical details and assumptions. Even if you could master every technical detail underlying 'data science' you have only scratched the surface. What distinguishes data science from the academic discipline of statistics, computer science, or machine learning is application to solve a problem for business or society. Its not theory for theory's sake. There are huge gaps between theory and application that can easily stump a team of PhD's or experienced practitioners (see also <a href="http://econometricsense.blogspot.com/2014/11/applied-econometrics.html">applied econometrics</a>). Podcasts like this can help bridge the gap.<br /><br /><a href="https://www.superdatascience.com/sds-014-credit-scoring-models-the-law-of-large-numbers-and-model-building-with-greg-poppe/">Episode 014</a> featured Greg Poppe who is Sr Vice President for risk management at an auto lending firm. They discussed how data science is leveraged in loan approvals and rate setting among other things.<br /><br />The general modeling approach that Greg discussed is very similar to work that I have done before in student risk modeling in higher education (see <a href="http://econometricsense.blogspot.com/2013/04/using-advanced-analytics-to-recruit.html">here</a> and <a href="http://econometricsense.blogspot.com/2013/12/sas-global-forum-papers.html">here</a>).<br /><br /><i>"So think of it like -- you know, I would have a hard time telling you with any high degree of certainty, “This loan will pay. This loan will pay. But this loan won’t.” However, if you give me a portfolio of a hundred loans, I should be able to say “15 aren’t going to pay. I don’t know which 15, but 15 won’t.” And then if you give me another portfolio that’s say riskier, I should be able to measure that risk and say “This is a riskier pool. 25 aren’t going to pay. And again, I don’t know which 25, but I’m estimating 25.” And that’s how we measure our accuracy. So it’s not so much on a loan-by-loan basis. It’s “If we just select a random sample, how many did not pay, and what was our expectation of that?” And if they’re very close, we consider our models to be accurate."</i><br /><br />A toy example in R that seems very similar can be found here (<a href="http://econometricsense.blogspot.com/2011/03/predictive-modeling-and-custom.html">Predictive Modeling and Custom Reporting in R</a>).<br /><br />So at a basic level they are just using predictive models to get a score and using cutoffs to determine different pools of risk and making approvals, declines, and setting interest rates based on this. He doesn't discuss the specifics of the model testing, but to me the key here sounds a lot like calibration (see <a href="http://econometricsense.blogspot.com/2013/04/is-roc-curve-good-metric-for-model.html">Is the ROC curve a good metric for model calibration?</a>). In terms of the types of models they use of this it gets very interesting. As Kirill says, the whole podcast is worth listening to for this very point. For their credit scoring models they use regression, even though they could get improved performance from other algorithms like decision trees or ensembles. Why?<br /><br /><i>"so primarily in the credit decisioning models, we use regression models. And the reason why—well, there’s quite a few. One is it’s very computationally easy. It’s easy to explain, it’s easy for people to understand but it’s also not a black box in the sense that a lot of models can be, and what we need to do is we need to provide a continuity to a dealership because they can adjust the parameters of the application and that will adjust the risk accordingly…..If we were to go with a CART model or any other decision tree model, if the first break point or the first cut point in that model is down payment and they go from one side to the other, it can throw it down a completely separate set of decision logic and they can get very strange approvals. From a data science perspective and from an analytics perspective, that may be more accurate but it’s not sellable, it’s not marketable to the dealership."</i><br /><br />Yes huge gap just filled and well worth repeating. Its interesting, in a different scenario you could go the other way around. For instance, in my work in higher education student risk modeling we went with decision trees instead of regression but based on a similar line of reasoning. Our end users however were not going to be tweaking parameters but getting sign off and buy in required that they understand more about what the model was doing. The explicit nature of the splits and decision logic of the trees was easier to explain and understand for untrained statisticians than was regression models or neural networks.<br /><br />If you have been a practitioner for a while you might think of course every data scientist knows there is a tradeoff between accuracy, complexity, and functional practicality. I agree but it still can't be emphasized enough. And more time should be spent on applied examples like this vs the waste we see in social media discussion who is or isn't a fake data scientist. The real data scientists are too busy working in the gaps between theory and practice to care. To be continued....<br /><br /><br /><br /><br /><br /><br /><br />Matt Bogardhttp://www.blogger.com/profile/10510725993509264716noreply@blogger.com0