Monday, August 7, 2017

Confidence Intervals: Fad or Fashion

Confidence intervals seem to be the fad among some in pop stats/data science/analytics. Whenever there is mention of p-hacking, or the ills of publication standards, or the pitfalls of null hypothesis significance testing, CIs almost always seem to be the popular solution.

There are some attractive features of CIs. This paper provides some alternative views of CIs, discusses some strengths and weaknesses, and ultimately proposes that they are on balance superior to p-values and hypothesis testing. CIs can bring more information to the table in terms of effect sizes for a given sample however some of the statements made in this article need to be read with caution. I just wonder how much the fascination with CIs is largely the result of confusing a Bayesian interpretation with a frequentist application or just sloppy misinterpretation. I completely disagree that they are more straight forward to students (compared to interpreting hypothesis tests and p-values as the article claims).

Dave Giles gives a very good review starting with the very basics of what is a parameter vs. an estimator vs. an estimate, sampling distributions etc. After reviewing the concepts key to understanding CIs he points out two very common interpretations of CIs that are clearly wrong:

1) There's a 95% probability that the true value of the regression coefficient lies in the interval [a,b].
2) This interval includes the true value of the regression coefficient 95% of the time.

"we really should talk about the (random) intervals "covering" the (fixed) value of the parameter. If, as some people do, we talk about the parameter "falling in the interval", it sounds as if it's the parameter that's random and the interval that's fixed. Not so!"

In Robust misinterpretation of confidence intervals, the authors take on the idea that confidence intervals offer a panacea for interpretation issues related to null hypothesis significance testing (NHST):

"Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual...Our findings suggest that many researchers do not know the correct interpretation of a CI....As is the case with p-values, CIs do not allow one to make probability statements about parameters or hypotheses."

The authors present evidence about this misunderstanding by presenting subjects with a number of false statements regarding confidence intervals (including the two above pointed out by Dave Giles) and noting the frequency of incorrect affirmations about their truth.

In Osteoarthritis and Cartilage, authors write:

"In spite of frequent discussions of misuse and misunderstanding of probability values (P-values) they still appear in most scientific publications, and the disadvantages of erroneous and simplistic P-value interpretations grow with the number of scientific publications."

They raise a number of issues related to both p-values and confidence intervals (multiplicity of testing, the focus on effect sizes, etc.) and they point out some informative differences between using p-values vs. using standard errors to produce 'error bars.' However, in trying to clarify the advantages of p-values they step really close to what might be considered an erroneous and simplistic interpretation:

"the great advantage with confidence intervals is that they do show what effects are likely to exist in the population. Values excluded from the confidence interval are thus not likely to exist in the population. "

Maybe I am being picky, but if we are going to be picky about interpreting p-values then the same goes for CIs. It sounds a lot like they are talking about 'a parameter falling into an interval' or the 'probability of a parameter falling into an interval' as Dave cautions against. They seem careful enough in their language using the term 'likely' vs. making strong probability statements, so maybe they are making a more heuristic interpretation that while useful may not be the most correct.

In Mastering 'Metrics, Angrist and Pishcke give a great interpretation of confidence intervals that doesn't lend itself in my opinion as easily to abusive probability interpretations:

"By describing a set of parameter values consistent with our data, confidence intervals provide a compact summary of the information these data contain about the population from which they were sampled"

I think the authors Osteoarthritis and Cartilage could have stated their case better if they had said:

"The great advantage of confidence intervals is that they describe what effects in the population are consistent with our sample data. Our sample data is not consistent with population effects excluded from the confidence interval."

Both hypothesis testing and confidence intervals are statements about the compatibility of our observable sample data with population characteristics of interest. The ASAreleased a set of clarifications on statements on p-values. Number 2 states that "P-values do not measure the probability that the studied hypothesis is true." Nor does a confidence interval (again see Ranstan, 2014).

Venturing into the risky practice of making imperfect analogies, take this loosely from the perspective of criminal investigations. We might think of confidence intervals as narrowing the range of suspects based on observed evidence, without providing specific probabilities related to the guilt or innocence of any particular suspect. Better evidence narrows the list, just as better evidence in our sample data (less noise) will narrow the confidence interval.

I see no harm in CIs and more good if they draw more attention to practical/clinical significance of effect sizes. But I think the temptation to incorrectly represent CIs can be just as strong as the temptation to speak boldly of 'significant' findings following an exercise in p-hacking or in the face of meaningless effect sizes. Maybe some sins are greater than others and proponents feel more comfortable with misinterpretations/overinterpretations of CIs than they do with misinterpretations/overinterpretaions of p-values.

Or as Briggs concludes about this issue:

"Since no frequentist can interpret a confidence interval in any but in a logical probability or Bayesian way, it would be best to admit it and abandon frequentism"

Methods of Psychological Research Online 1999, Vol.4, No.2 © 1999 PABST SCIENCE PUBLISHERS Confidence Intervals as an Alternative to Significance Testing Eduard Brandstätter1 Johannes Kepler Universität Linz

J. Ranstam, Why the -value culture is bad and confidence intervals a better alternative, Osteoarthritis and Cartilage, Volume 20, Issue 8, 2012, Pages 805-808, ISSN 1063-4584, (

Robust misinterpretation of confidence intervals
Rink Hoekstra & Richard D. Morey & Jeffrey N. Rouder &
Eric-Jan Wagenmakers Psychon Bull Rev
DOI 10.3758/s13423-013-0572-3 2014

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.