Confidence intervals seem to be the fad among some in pop stats/data science/analytics. Whenever there is mention of
p-hacking, or the ills of publication standards, or the pitfalls of
null hypothesis significance testing, CIs almost always seem to be
the popular solution.

There are some attractive features of
CIs. This paper provides some alternative views of CIs, discusses
some strengths and weaknesses, and ultimately proposes that they are
on balance superior to p-values and hypothesis testing. CIs can bring
more information to the table in terms of effect sizes for a given
sample however some of the statements made in
this article need to be read with caution. I just wonder how much the fascination with CIs is largely the result of confusing
a Bayesian interpretation with a frequentist application or just
sloppy misinterpretation. I completely disagree that they are more
straight forward to students (compared to interpreting hypothesis
tests and p-values as the article claims).

Dave Giles gives a very good review
starting with the very basics of what is a parameter vs. an estimator
vs. an estimate, sampling distributions etc. After reviewing the
concepts key to understanding CIs he points out two very common
interpretations of CIs that are clearly wrong:

*1) There's a 95% probability that the true value of the regression coefficient lies in the interval [a,b].*

*2) This interval includes the true value of the regression coefficient 95% of the time.*

*"we really should talk about the (random) intervals "covering" the (fixed) value of the parameter. If, as some people do, we talk about the parameter "falling in the interval", it sounds as if it's the parameter that's random and the interval that's fixed. Not so!"*

In

In

*Robust misinterpretation of confidence intervals,*the authors take on the idea that confidence intervals offer a panacea for interpretation issues related to null hypothesis significance testing (NHST):*"Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual...Our findings suggest that many researchers do not know the correct interpretation of a CI....As is the case with p-values, CIs do not allow one to make probability statements about parameters or hypotheses."**The authors present evidence about this misunderstanding by presenting subjects with a number of false statements regarding confidence intervals (including the two above pointed out by Dave Giles) and noting the frequency of incorrect affirmations about their truth.*

In

*Osteoarthritis and Cartilage*, authors write:*"In spite of frequent discussions of misuse and misunderstanding of probability values (P-values) they still appear in most scientific publications, and the disadvantages of erroneous and simplistic P-value interpretations grow with the number of scientific publications."*

They raise a number of issues related
to both p-values and confidence intervals (multiplicity of testing,
the focus on effect sizes, etc.) and they point out some informative
differences between using p-values vs. using standard errors to
produce 'error bars.' However, in trying to clarify the advantages of
p-values they step really close to what might be considered an
erroneous and simplistic interpretation:

*"the great advantage with confidence intervals is that they do show what effects are likely to exist in the population. Values excluded from the confidence interval are thus not likely to exist in the population. "*

Maybe I am being picky, but if we are
going to be picky about interpreting p-values then the same goes for
CIs. It sounds a lot like they are talking about 'a parameter
falling into an interval' or the 'probability of a parameter falling
into an interval' as Dave cautions against. They seem careful enough
in their language using the term 'likely' vs. making strong
probability statements, so maybe they are making a more heuristic
interpretation that while useful may not be the most correct.

In

*Mastering 'Metrics,*Angrist and Pishcke give a great interpretation of confidence intervals that doesn't lend itself in my opinion as easily to abusive probability interpretations:*"By describing a set of parameter values consistent with our data, confidence intervals provide a compact summary of the information these data contain about the population from which they were sampled"*

I think the authors

*Osteoarthritis and Cartilage*could have stated their case better if they had said:*"The great advantage of confidence intervals is that they describe what effects in the population are consistent with our sample data. Our sample data is not consistent with population effects excluded from the confidence interval."*

Both hypothesis testing and confidence
intervals are statements about the compatibility of our observable
sample data with population characteristics of interest. The ASAreleased a set of clarifications on statements on p-values. Number 2
states that

Venturing into the risky practice of making imperfect analogies, take this loosely from the perspective of criminal investigations. We might think of confidence intervals as narrowing the range of suspects based on observed evidence, without providing specific probabilities related to the guilt or innocence of any particular suspect. Better evidence narrows the list, just as better evidence in our sample data (less noise) will narrow the confidence interval.

*"P-values do not measure the probability that the studied hypothesis is true."*Nor does a confidence interval (again see Ranstan, 2014).Venturing into the risky practice of making imperfect analogies, take this loosely from the perspective of criminal investigations. We might think of confidence intervals as narrowing the range of suspects based on observed evidence, without providing specific probabilities related to the guilt or innocence of any particular suspect. Better evidence narrows the list, just as better evidence in our sample data (less noise) will narrow the confidence interval.

I see no harm in CIs and more good if
they draw more attention to practical/clinical significance of effect
sizes. But I think the temptation to incorrectly represent CIs can be just as strong as the temptation to speak
boldly of 'significant' findings following an exercise in p-hacking
or in the face of meaningless effect sizes. Maybe some sins are
greater than others and proponents feel more comfortable with
misinterpretations/overinterpretations of CIs than they do with
misinterpretations/overinterpretaions of p-values.

Or as Briggs concludes about this
issue:

*"Since no frequentist can interpret a confidence interval in any but in a logical probability or Bayesian way, it would be best to admit it and abandon frequentism"*

**See also:**

Andrew Gelman: The Fallacy of Placing Confidence in Confidence Intervals.

Noah Smith: The Backlash to the Backlash Against P-values

**References:**

Methods of Psychological Research
Online 1999, Vol.4, No.2 © 1999 PABST SCIENCE PUBLISHERS Confidence
Intervals as an Alternative to Significance Testing Eduard
Brandstätter1 Johannes Kepler Universität Linz

J. Ranstam, Why the -value culture is
bad and confidence intervals a better alternative, Osteoarthritis and
Cartilage, Volume 20, Issue 8, 2012, Pages 805-808, ISSN 1063-4584,
http://dx.doi.org/10.1016/j.joca.2012.04.001 (http://www.sciencedirect.com/science/article/pii/S1063458412007789)

Robust misinterpretation of confidence intervals

Rink Hoekstra & Richard D. Morey & Jeffrey N. Rouder &

Eric-Jan Wagenmakers Psychon Bull Rev

DOI 10.3758/s13423-013-0572-3 2014

Robust misinterpretation of confidence intervals

Rink Hoekstra & Richard D. Morey & Jeffrey N. Rouder &

Eric-Jan Wagenmakers Psychon Bull Rev

DOI 10.3758/s13423-013-0572-3 2014

## No comments:

## Post a Comment

Note: Only a member of this blog may post a comment.