Friday, December 21, 2018

Thinking About Confidence Intervals: Horseshoes and Hand Grenades

In a previous post, Confidence Intervals: Fad or Fashion I wrote about Dave Giles' post on interpreting confidence intervals. A primary focus of these discussions was how confidence intervals are often mis-interpreted. For instance the two statements below are common mischaracterizations of CIs:

1) There's a 95% probability that the true value of the regression coefficient lies in the interval [a,b].
2) This interval includes the true value of the regression coefficient 95% of the time.

You can read the previous post or Dave's post for more details. But in re-reading Dave's post myself recently one statement had me thinking:

"So, the first interpretation I gave for the confidence interval in the opening paragraph above is clearly wrong. The correct probability there is not 95% - it's either zero or 100%! The second interpretation is also wrong. "This interval" doesn't include the true value 95% of the time. Instead, 95% of such intervals will cover the true value."

I like the way he put that...'95% of such intervals' distinguishing this from a particular observed/calculated confidence interval. I think someone trained to think about CIs in the incorrect probabilistic way may have trouble getting at this. So how might we think about this in a way that captures CIs in a way that is still useful, but doesn't get us tripped up with incorrect probability statements?

My favorite statistics text is Degroot's Probability and Statistics. In the 4th edition they are very careful about explaining confidence intervals:

"Once we compute the observed values of a and b, the observed interval (a,b) is not so easy to interpret....Before observing the data we can be 95% confident that the random interval (A,B) will contain mu, but after observing the data, the safest interpretation is that (a,b) is simply the observed value of the random interval (A,B)"

While Degroot is careful, it still may not be very intuitive. However, in Principles and Procedures of Statistics: A Biometrical Approach (Steel, Torie, and Dickey) they present a more intuitive explanation.

"since mu will either be or not be in the interval, that is P=0 or 1, the probability will actually be a measure of confidence we placed in the procedure that led to the statement. This is like throwing a ring at a fixed post; the ring doesn't land in the same position or even catch on the post every time. However we are able to say that we can circle the post 9 times out of 10, or whatever the value should be for the measure of our confidence in our proficiency."

The ring tossing analogy seems to work pretty well. I'll customize it by using horseshoes instead. Yes 95 out of 100 times you might throw a ringer (in the game of horseshoes that is when the horse shoe circles the peg or stake when you toss it). You know this before you toss it. And to use Dave Giles language, *before* calculating a confidence interval we know that 95% of such intervals will cover the population parameter of interest. And, after we toss the shoe, it either circles the peg or not, that is a 1 or a 0 in terms of probability. Similarly, *after* computing a confidence interval, the true mean or population parameter of interest is covered or not with a probability of 0 or 100%.

This isn't perfect, but thinking of confidence intervals this way at least keeps us honest about making probability statements.

Going back to my previous post, I still like the description of confidence intervals Angrist and Pishke provide in Mastering 'Metrics, that is 'describing a set of parameter values consistent with our data.' 

For instance if we run the regression:

y = b0 + b1X + e  to estimate y = B0 + B1 + e

and get our parameter estimate b with a 95% confidence interval like (1.2,1.8), we can say that our sample data is consistent with any population that has a B taking a value that falls in the interval. That implies there are a number of populations that our data would be consistent with. Narrower intervals imply very similar populations, very similar values of B, and speaks to more precision in our estimate of B.

I really can't make an analogy for hand grenades. It just gave me a title with a ring to it.

See also:
Interpreting Confidence Intervals
Bayesian Statistics Confidence Intervals and Regularization
Overconfident Confidence Intervals