Tuesday, January 13, 2015

Overconfident Confidence Intervals

In an interesting post, "Playing Dumb on Statistical Significance" there is a discussion relating to Naomi Oreskes January 4 NYT piece Playing Dumb on Climate Change. The dialogue centers around her reference to confidence intervals and possibly an overbearing burden of proof that researchers apply related to statistical significance.

From the article:

"Although the confidence interval is related to the pre-specified Type I error rate, alpha, and so a conventional alpha of 5% does lead to a coefficient of confidence of 95%, Oreskes has misstated the confidence interval to be a burden of proof consisting of a 95% posterior probability. The “relationship” is either true or not; the p-value or confidence interval provides a probability for the sample statistic, or one more extreme, on the assumption that the null hypothesis is correct. The 95% probability of confidence intervals derives from the long-term frequency that 95% of all confidence intervals, based upon samples of the same size, will contain the true parameter of interest."

As mentioned in the piece, Oreskes writing does have a Bayesian ring to it and this whole story and critique makes me think of Kennedy's chapter on "The Bayesian Approach" in his book "A Guide to Econometrics".  I believe that people often interpret frequentist based confidence intervals  from a bayesian perspective. If I understand any of this at all, and I admit my knowledge of Bayesian econometrics is limited, then I think I have been a guilty offender at times as well.  In the chapter it is even stated that Bayesians tout their methods because Bayesian thinking is actually how people really think and that is why they so often misinterpret frequentist confidence intervals.

In Bayesian analysis, a posterior probability distribution is produced (and a posterior probability interval) that 'chops off' 2.5% from each tail leaving an area or probability of 95%. From a Bayesian perspective, it is correct for the researcher to claim or believe that there is a 95% probability that the true value of the parameter they are estimating will fall within the interval. This is how many people interpret confidence intervals, which are quite different from Bayesian posterior probability intervals. An illustration is given from Kennedy:

"How do you think about an unknown parameter? When you are told that the interval between 2.6 and 2.7 is a 95% confidence interval, how do you think about this? Do you think, I am willing to bet $95 to your $5 that the true value of 'beta' lies in this interval [note this sounds a lot like Oreskes as if you read the article above]? Or do you think, if I were to estimate this interval over and over again using data with different error terms, then 95% of the time this interval will cover the true value of 'beta'"?

"Are you a Bayesian or a frequentist?"

1 comment: