## Saturday, March 7, 2015

### Pinning P-values to the Wall

In recent headlines we hear about one journal that is 'banning' the use of p-values. In part this is justified on the basis that there can often be widespread misuse or mis-understanding of p-values in empirical applications:

"The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true."

"In other words, it is the probability of the data given the null hypothesis. However, it is often misunderstood to be the probability of the hypothesis given the data...."

Which leads to many people with a thought process leading them to think:

"if you reach the magical p-value then your hypothesis is true."

Similarly the journal recognizes the misunderstanding of confidence intervals (no surprise since a 95% confidence interval can be used to equivalently test any hypothesis at the 5% level of significance).

"In the NHSTP, the problem is in traversing the distance from the probability of the finding, given the null hypothesis, to the probability of the null hypothesis, given the finding. Regarding confidence intervals, the problem is that, for example, a 95% confidence interval does not indicate that the parameter of interest has a 95% probability of being within the interval. Rather, it means merely that if an infinite number of samples were taken and confidence intervals computed, 95% of the confidence intervals would capture the population parameter. Analogous to how the NHSTP fails to provide the probability of the null hypothesis, which is needed to provide a strong case for rejecting it, confidence intervals do not provide a strong case for concluding that the population parameter of interest is likely to be within the stated interval."

But despite these interpretation issues, it may be the case in many instances that the same results are reached regardless of a bayesian or frequentist approach. (barring those situations where bayesian methods may offer an advantage). The bigger issue perhaps is p-hacking.

In relation to p-hacking, I recently ran across an article (likely one very familiar to a lot of readers) that discusses issues related to p-hacking and research methodologies in general that are 'out of control' so to speak:

Deming, data and observational studies: A process out of control and needing fixing

In this article, the authors lay out several best practices to bring scholarly research back 'in control.'

0 Data are made publicly available
1 Data cleaning and analysis separate
2 Split sample: A, modelling; and B, holdout (testing)
3 Analysis plan is written, based on modelling data only
4 Written protocol, based on viewing predictor variables of A
5 Analysis of A only data set
6 Journal accepts paper based on A only