Confidence intervals seem to be the fad among some in pop stats/data science/analytics. Whenever there is mention of
p-hacking, or the ills of publication standards, or the pitfalls of
null hypothesis significance testing, CIs almost always seem to be
the popular solution.
There are some attractive features of
CIs. This paper provides some alternative views of CIs, discusses
some strengths and weaknesses, and ultimately proposes that they are
on balance superior to p-values and hypothesis testing. CIs can bring
more information to the table in terms of effect sizes for a given
sample however some of the statements made in
this article need to be read with caution. I just wonder how much the fascination with CIs is largely the result of confusing
a Bayesian interpretation with a frequentist application or just
sloppy misinterpretation. I completely disagree that they are more
straight forward to students (compared to interpreting hypothesis
tests and p-values as the article claims).
Dave Giles gives a very good review
starting with the very basics of what is a parameter vs. an estimator
vs. an estimate, sampling distributions etc. After reviewing the
concepts key to understanding CIs he points out two very common
interpretations of CIs that are clearly wrong:
1) There's a 95% probability that the true
value of the regression coefficient lies in the interval [a,b].
2) This interval includes the true value
of the regression coefficient 95% of the time.
"we really should talk about the
(random) intervals "covering" the (fixed) value of the
parameter. If, as some people do, we talk about the parameter
"falling in the interval", it sounds as if it's the
parameter that's random and the interval that's fixed. Not so!"
In Robust misinterpretation of confidence intervals, the authors take on the idea that confidence intervals offer a panacea for interpretation issues related to null hypothesis significance testing (NHST):
"Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual...Our findings suggest that many researchers do not know the correct interpretation of a CI....As is the case with p-values, CIs do not allow one to make probability statements about parameters or hypotheses."
The authors present evidence about this misunderstanding by presenting subjects with a number of false statements regarding confidence intervals (including the two above pointed out by Dave Giles) and noting the frequency of incorrect affirmations about their truth.
In Mastering 'Metrics, Angrist and Pishcke give a great interpretation of confidence intervals that doesn't lend itself in my opinion as easily to abusive probability interpretations:
"Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual...Our findings suggest that many researchers do not know the correct interpretation of a CI....As is the case with p-values, CIs do not allow one to make probability statements about parameters or hypotheses."
The authors present evidence about this misunderstanding by presenting subjects with a number of false statements regarding confidence intervals (including the two above pointed out by Dave Giles) and noting the frequency of incorrect affirmations about their truth.
In Mastering 'Metrics, Angrist and Pishcke give a great interpretation of confidence intervals that doesn't lend itself in my opinion as easily to abusive probability interpretations:
"By describing a set of parameter
values consistent with our data, confidence intervals provide a
compact summary of the information these data contain about the
population from which they were sampled"
Both hypothesis testing and confidence
intervals are statements about the compatibility of our observable
sample data with population characteristics of interest. The ASAreleased a set of clarifications on statements on p-values. Number 2
states that "P-values do not measure the probability that the
studied hypothesis is true." Nor does a confidence interval (again see Ranstan, 2014).
Venturing into the risky practice of making imperfect analogies, take this loosely from the perspective of criminal investigations. We might think of confidence intervals as narrowing the range of suspects based on observed evidence, without providing specific probabilities related to the guilt or innocence of any particular suspect. Better evidence narrows the list, just as better evidence in our sample data (less noise) will narrow the confidence interval.
Venturing into the risky practice of making imperfect analogies, take this loosely from the perspective of criminal investigations. We might think of confidence intervals as narrowing the range of suspects based on observed evidence, without providing specific probabilities related to the guilt or innocence of any particular suspect. Better evidence narrows the list, just as better evidence in our sample data (less noise) will narrow the confidence interval.
I see no harm in CIs and more good if
they draw more attention to practical/clinical significance of effect
sizes. But I think the temptation to incorrectly represent CIs can be just as strong as the temptation to speak
boldly of 'significant' findings following an exercise in p-hacking
or in the face of meaningless effect sizes. Maybe some sins are
greater than others and proponents feel more comfortable with
misinterpretations/overinterpretations of CIs than they do with
misinterpretations/overinterpretaions of p-values.
Or as Briggs concludes about this
issue:
"Since no frequentist can
interpret a confidence interval in any but in a logical probability
or Bayesian way, it would be best to admit it and abandon
frequentism"
See also:
Andrew Gelman: The Fallacy of Placing Confidence in Confidence Intervals.
Noah Smith: The Backlash to the Backlash Against P-values
References:
Andrew Gelman: The Fallacy of Placing Confidence in Confidence Intervals.
Noah Smith: The Backlash to the Backlash Against P-values
References:
Methods of Psychological Research
Online 1999, Vol.4, No.2 © 1999 PABST SCIENCE PUBLISHERS Confidence
Intervals as an Alternative to Significance Testing Eduard
Brandstätter1 Johannes Kepler Universität Linz
J. Ranstam, Why the -value culture is
bad and confidence intervals a better alternative, Osteoarthritis and
Cartilage, Volume 20, Issue 8, 2012, Pages 805-808, ISSN 1063-4584,
http://dx.doi.org/10.1016/j.joca.2012.04.001 (http://www.sciencedirect.com/science/article/pii/S1063458412007789)
Robust misinterpretation of confidence intervals
Rink Hoekstra & Richard D. Morey & Jeffrey N. Rouder &
Eric-Jan Wagenmakers Psychon Bull Rev
DOI 10.3758/s13423-013-0572-3 2014
Robust misinterpretation of confidence intervals
Rink Hoekstra & Richard D. Morey & Jeffrey N. Rouder &
Eric-Jan Wagenmakers Psychon Bull Rev
DOI 10.3758/s13423-013-0572-3 2014
No comments:
Post a Comment