Saturday, October 20, 2018

Power and Sample Size Analysis in Applied Econometrics

In applied work in econometrics I've done a limited amount of power and sample size analysis. Recently I was thinking about a conversation from an episode of the EconTalk podcast with Russ Roberts and John Ioannidis where the topic of power came up:

“though I was trained as a Ph.D., got a Ph.D. in economics at the U. of Chicago, I never heard that phrase, 'power,' applied to a statistical analysis. What we did--and I think what most economists, many economists, still do, is: we had a data set; we had something we wanted to discover and test or examine or explore, depending on the nature of the problem.”

That rings familiar to me. In eight years of attending talks and seminars in applied economics, what stands out are discussions of identification, endogeneity, standard errors etc. Not power or sample size. So I went back and looked at all of my copies of econometrics textbooks. These are well known and have been commonly used by masters and PhD graduate students in economics. Econometric Analysis by Greene, Econometric Analysis of Cross Section and Panel Data by Wooldridge,  A Course in Econometrics by Goldberger, A Guide to Econometrics by Kennedy, Using Econometrics by Studenmund. I even threw in Mastering 'Metrics and Mostly Harmless Econometrics by Angrist and Pischke.

While Wooldridge did discuss clustering and stratified sampling, most of the emphasis was placed on getting the correct standard errors and appropriate weighting. From my previous years of referencing these texts, as well as a cursory review again of the index and chapters of each one I could not find any treatment of power or sample size calculations.

So I thought, maybe this is something covered in prerequisite courses. Going back to the undergraduate level in economics I recall very little about this. Checking a popular text, Statistics for Business and Economics by Anderson, Sweeney, Williams, Camm, and Cochran I did find a basic example in relation to power and sample sizes for a t-test.  What about a graduate level pre-requisite for econometrics? In my first year of graduate school I took a graduate level course in mathematical statistics (this was a course doing business under a research methods title) that used Degroot's text Probability and Statistics. Definitely a lot about the concept of power in theory, but no emphasis on various calculations for sample size. The one textbook I own with treatment of this is Principles and Procedures of Statistics, A Biometrical Approach by Steel, Torrie, and Dickey. But that does not count because that was the text used in my experimental design course in graduate school. Not part of a standard econometrics curriculum.

I've come to the conclusion that power and sample size analysis may not be widely emphasized in graduate econometrics training across the board in all programs. It's not something missed in a lecture a decade ago. Similar to advanced specialized topics like spatial econometrics, details related to power and sample size analysis, survey design, stratified random sampling etc. are likely covered depending on one's specialty in the field and the program.

However,  it is evident that some economists do this kind of work.

For instance, here is an example from a paper with food economist Jayson Lusk:

"However, there are many economic problems where sample size directly affects a benefit or loss function. In these cases, sample size is an endogenous variable that should be considered jointly with other choice variables in an optimization problem. In this article we introduce an economic approach to sample size determination utilizing a Bayesian decision theoretic framework."

As well as healthcare economist Austin Frakt. 

So why do we care about power and sample size and what is 'power'?

Jim Manzi, Author of Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society offers the following analogy in an Econ Talk podcast:

“Well, the power in a statistical experiment, and I often use this analogy, is sort of like the magnification power on the microscope you probably used in high school biology. It has on the side, 4x, 8x, 16x, which is how many times it can increase the apparent size of a physical object. And the metaphor I'd use is, if I try and use a child's microscope to carefully observe a section of a leaf looking for an insect that's a little smaller than an ant, and I don't observe the ant, I can reliably say: I don't see the insect, and therefore there is no bug there. If I use that exact same microscope to try and find on that exact same piece of leaf, not a bug but a tiny microbe that's, you know, smaller than a speck of dust, I'll look at it and I'll say: it's all kind of fuzzy, I see a lot of squiggly things; I think that little squiggle might be something or it might not. I don't see the microbe, but I can't reliably say that therefore there is no microbe there, because trying to zoom in closer and closer to look for something that small, all I see is a bunch of fuzz. So my failure to see the microbe is a statement about the precision of my instrument, not about whether there's really a microbe on the leaf.”

So, if we have a sample that is ‘not sufficiently powered’ it is possible that we could fail to find a relationship between treatment and outcome, even if one actually exists. Equivalently, our estimated regression coefficient may not be statistically significant when a relationship actually does exist. Increasing sample size is one primary way to increase power in an experiment. So the question becomes how large does ‘n’ have to be to have a sample sufficiently powered to detect the effect of a treatment on an outcome (at some stated level of significance)?

So how do you do these calculations? If you can't find examples in your econometrics textbook (if you do find one let me know!) there are plenty of texts in the biostatistics genre that probably cover this. Principles and Procedures of Statistics, A Biometrical Approach by Steel, Torrie, and Dickey is one example that I started with. Cochran, W (1977). Sampling. Techniques, 3rd ed. is another often cited source.

See also: Andrew Gelman on Econtalk discussing "what does not kill my statistical significance makes it stronger"