Saturday, March 7, 2015

Pinning P-values to the Wall

In recent headlines we hear about one journal that is 'banning' the use of p-values. In part this is justified on the basis that there can often be widespread misuse or mis-understanding of p-values in empirical applications:

"The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true."

"In other words, it is the probability of the data given the null hypothesis. However, it is often misunderstood to be the probability of the hypothesis given the data...."

Which leads to many people with a thought process leading them to think:

 "if you reach the magical p-value then your hypothesis is true."

Similarly the journal recognizes the misunderstanding of confidence intervals (no surprise since a 95% confidence interval can be used to equivalently test any hypothesis at the 5% level of significance).

"In the NHSTP, the problem is in traversing the distance from the probability of the finding, given the null hypothesis, to the probability of the null hypothesis, given the finding. Regarding confidence intervals, the problem is that, for example, a 95% confidence interval does not indicate that the parameter of interest has a 95% probability of being within the interval. Rather, it means merely that if an infinite number of samples were taken and confidence intervals computed, 95% of the confidence intervals would capture the population parameter. Analogous to how the NHSTP fails to provide the probability of the null hypothesis, which is needed to provide a strong case for rejecting it, confidence intervals do not provide a strong case for concluding that the population parameter of interest is likely to be within the stated interval."

But despite these interpretation issues, it may be the case in many instances that the same results are reached regardless of a bayesian or frequentist approach. (barring those situations where bayesian methods may offer an advantage). The bigger issue perhaps is p-hacking. 

In relation to p-hacking, I recently ran across an article (likely one very familiar to a lot of readers) that discusses issues related to p-hacking and research methodologies in general that are 'out of control' so to speak:

Deming, data and observational studies: A process out of control and needing fixing
2011 Royal Statistical Society (link)

Within this article, one phrase seems to resonate with me:

"without access to data the research is largely “trust me” science"

In this article, the authors lay out several best practices to bring scholarly research back 'in control.'

0 Data are made publicly available
1 Data cleaning and analysis separate
2 Split sample: A, modelling; and B, holdout (testing)
3 Analysis plan is written, based on modelling data only
4 Written protocol, based on viewing predictor variables of A
5 Analysis of A only data set
6 Journal accepts paper based on A only 
7 Analysis of B adds additional support or validation of A

In a previous post, I discussed how the Quarterly Journal of Political Science was now requiring each article submission to have a research package, consisting of data and analysis code:

I really like some of these ideas and agree without the data and code, replication is often a joke, and it really is in so many ways trust me science. This is why it is hard to get excited about any headline relating to some finding from a  so called study without careful scrutiny.

Given the credibility revolution in applied micro-econometrics, the guidelines outlined in the Royal Statistical Society article and requiring model packages with code (like the QJPS), I would think there is ground for good work to be done, short of totally banning p-values. 

No comments:

Post a Comment