From: Handbook of Biological Statistics http://www.biostathandbook.com/confidence.html
"There is a myth that when two means have confidence intervals that overlap, the means are not significantly different (at the P<0.05 level)… (Schenker and Gentleman 2001, Payton et al. 2003); it is easy for two sets of numbers to have overlapping confidence intervals, yet still be significantly different by a two-sample t–test; conversely… Don't try compare two means by visually comparing their confidence intervals, just use the correct statistical test."
A really cogent note related to this from the Cornell Statistical Consulting unit can be found here.
"Generally, when comparing two parameter estimates, it is always true that if the confidence intervals do not overlap, then the statistics will be statistically significantly different. However, the converse is not true. That is, it is erroneous to determine the statistical significance of the difference between two statistics based on overlapping confidence intervals."More details using basic math here.A 2005 article in Psychological Methods indicates a large number of researchers don't interpret them correctly. In an interesting American Psychologist article (2005) researchers determined that under a number of broadly applicable conditions 95% confidence intervals can overlap as much as 25% for groups that are actually significantly different at the 5% level and that with zero overlap statistical significance is actually at the 1% level (p~.01).Dave Giles also brings up an Insect Science paper by Payton et al in discussion related to using CI to determine statistical significance that relates to this:http://davegiles.blogspot.com/2017/01/hypothesis-testing-using-non.html?_sm_au_=iVVFv75rHPD1TSHs
"Here's a well-known result that bears on this use of the confidence intervals. Recall that we're effectively testing H0: μ1 = μ2, against HA: μ1 ≠ μ2. If we construct the two 95% confidence intervals, and they fail to overlap, then this does not imply rejection of H0 at the 5% significance level. In fact the correct significance is roughly one-tenth of that. Yes, 0.5%!
If you want to learn why, there are plenty of references to help you. For instance, check out McGill et al. (1978), Andrews et al. (1980), Schenker and Gentleman (2001), Masson and Loftus (2003), and Payton et al. (2003) - to name a few. The last of these papers also demonstrates that a rough rule-of-thumb would be to use 84% confidence intervals if you want to achieve an effective 5% significance level when you "try" to test H0 by looking at the overlap/non-overlap of the intervals."
Actually according to the last paper mentioned (Payton,2003) the 84% CI is adjusted depending on the ratio of standard errors from the two populations you are comparing:
But all of the work above is predicated on a comparison of two populations. Considerations of multiple comparisons complicate things further. (see Rick Wicklin's post on doing this in SAS). Perhaps if a visual presentation is what we want we plot the CIs (as much as we may not like dynamite plots) but denote which groups are significantly different based on the properly specified tests (per the note from the handbook above). Something like below:http://freakonomics.com/2008/07/30/how-big-is-your-halo-a-guest-post/
References:Belia, S, Fidler, F, Williams, J, Cumming, G (2005). Researchers misunderstand confidence intervals and standard error bars Psychological Methods, 10 (4), 389-396Am Psychol. 2005 Feb-Mar;60(2):170-80. Inference by eye: confidence intervals and how to read pictures of data. Cumming G(1), Finch SPayton, M. E., M. H. Greenstone, and N. Schenker, 2003. Overlapping confidence intervals or standard error intervals: What do they mean in terms of statistical significance? Journal of Insect Science, 3, 1–6.