Saturday, October 20, 2018

Power and Sample Size Analysis in Applied Econometrics

In applied work in econometrics I've done a limited amount of power and sample size analysis. Having graduate training in both econometrics and biostatistics I made in interesting realization. This seemed scantly covered in my econometrics courses, and very seldom in all of my reading of the literature across numerous papers and applications over the years do I recall encountering very much treatment of power and sample size calculations (OK for the the most part...)

Recently thinking about this I recalled a conversation on the EconTalk podcast with Russ Roberts and John Ioannidis where this was discussed:

“though I was trained as a Ph.D., got a Ph.D. in economics at the U. of Chicago, I never heard that phrase, 'power,' applied to a statistical analysis. What we did--and I think what most economists, many economists, still do, is: we had a data set; we had something we wanted to discover and test or examine or explore, depending on the nature of the problem.”

That rings familiar to me. In eight years of attending talks and seminars in applied economics, what stands out are discussions of identification, endogeneity, standard errors etc. Not power or sample size. So I went back an looked at all of my copies of econometrics textbooks. These are well known and have been commonly used by masters and PhD graduate students in economics. Econometric Analysis by Greene, Econometric Analysis of Cross Section and Panel Data by Wooldridge,  A Course in Econometrics by Goldberger, A Guide to Econometrics by Kennedy, Using Econometrics by Studenmund. I even threw in Mastering 'Metrics and Mostly Harmless Econometrics by Angrist and Pischke.

While Wooldridge did discuss clustering and stratified sampling, most of the emphasis was placed on getting the correct standard errors and appropriate weighting. From my previous years of reading the remaining texts, as well as a cursory review of the index and chapters I could not find any treatment of power or sample size calculations.

So I thought, maybe this is something covered in prerequisite courses. Going back to the undergraduate level in economics I recall very little about this. Checking a popular text, Statistics for Business and Economics by Anderson, Sweeney, Williams, Camm, and Cochran I did find a basic example in relation to power and sample sizes for a t-test.  What about a graduate level pre-requisite for econometrics? In my first year of graduate school I took a graduate level course in mathematical statistics (this was a course doing business under a research methods title) that used Degroot's text Probability and Statistics. Definitely a lot about the concept of power in theory, but no emphasis on various calculations for sample size. The one textbook I own with treatment of this is Principles and Procedures of Statistics, A Biometrical Approach by Steel, Torrie, and Dickey. But that does not count because that was the text used in my Biometry and Statistics course in graduate school. Not part of a standard econometrics curriculum.

I've come to the conclusion that power and sample size analysis may not be widely emphasized in graduate econometrics training across the board in all programs. Its not something missed in a lecture a decade ago. Similar to advanced specialized topics like spatial econometrics, details related to power and sample size analysis, survey design, stratified random sampling etc. are likely covered depending on one's specialty in the field and the program.

However, some economists must do this kind of work.

For instance, here is an example from a paper with food economist Jayson Lusk:

"However, there are many economic problems where sample size directly affects a benefit or loss function. In these cases, sample size is an endogenous variable that should be considered jointly with other choice variables in an optimization problem. In this article we introduce an economic approach to sample size determination utilizing a Bayesian decision theoretic framework."

As well as healthcare economist Austin Frakt. 

So why do we care about sample size? In the end it boils down to statistical power. In the simplest terms, statistical power is the probability of detecting an effect in a statistical analysis. This may be equivalent to finding some statistical relationship between two variables of interest, perhaps a treatment and an outcome. More specifically this may be thought of as estimating a statistically significant regression coefficient relating the treatment to outcome. 

Jim Manzi, Author of Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society offers the following analogy in an Econ Talk podcast:

“Well, the power in a statistical experiment, and I often use this analogy, is sort of like the magnification power on the microscope you probably used in high school biology. It has on the side, 4x, 8x, 16x, which is how many times it can increase the apparent size of a physical object. And the metaphor I'd use is, if I try and use a child's microscope to carefully observe a section of a leaf looking for an insect that's a little smaller than an ant, and I don't observe the ant, I can reliably say: I don't see the insect, and therefore there is no bug there. If I use that exact same microscope to try and find on that exact same piece of leaf, not a bug but a tiny microbe that's, you know, smaller than a speck of dust, I'll look at it and I'll say: it's all kind of fuzzy, I see a lot of squiggly things; I think that little squiggle might be something or it might not. I don't see the microbe, but I can't reliably say that therefore there is no microbe there, because trying to zoom in closer and closer to look for something that small, all I see is a bunch of fuzz. So my failure to see the microbe is a statement about the precision of my instrument, not about whether there's really a microbe on the leaf.”

So, if we have a sample that is ‘not sufficiently powered’ it is possible that we could fail to find a relationship between treatment and outcome, even if one actually exists. Equivalently, our estimated regression coefficient may not be statistically significant when a relationship actually does exist. Increasing sample size is the primary way to increase power in an experiment. So the question becomes how large does ‘n’ have to be to have a sample sufficiently powered to detect the effect of a treatment on an outcome?

So how do you do these calculations? If you can't find examples in your econometrics textbook (if you do find one let me know!) there are plenty of texts in the biostatistics genre that probably cover this. Principles and Procedures of Statistics, A Biometrical Approach by Steel, Torrie, and Dickey is one example that I started with. Cochran, W (1977). Sampling. Techniques, 3rd ed. is another often cited source.