Friday, January 30, 2015

In God We Trust, All Others Show Me Your Code

There recently was a really interesting article at the Political Methodologist titled:

A Decade of Replications: Lessons from the Quarterly Journal of Political Science

They have high standards related to research documentation:

"Since its inception in 2005, the Quarterly Journal of Political Science (QJPS) has sought to encourage this type of transparency by requiring all submissions to be accompanied by a replication package, consisting of data and code for generating paper results. These packages are then made available with the paper on the QJPS website. In addition, all replication packages are subject to internal review by the QJPS prior to publication. This internal review includes ensuring the code executes smoothly, results from the paper can be easily located, and results generated by the replication package match those in the paper."

"Although the QJPS does not necessarily require the submitted code to access the data if the data are publicly available (e.g., data from the National Election Studies, or some other data repository), it does require that the dataset containing all of the original variables used in the analysis be included in the replication package. For the sake of transparency, the variables should be in their original, untransformed and unrecoded form, with code included that performs the transformations and recodings in the reported analyses. This allows replicators to assess the impact of transformations and recodings on the results."

From an efficiency standpoint, I don't know if this standard should be applied universally or not. We wouldn't want to bottleneck the body of peer reviewed literature contributing to society's pool of knowledge, but at the same time, some sort of filtration system might keep the murkiness out of the water so we can see more clearly the 'real' effects of policies and treatments.

I certainly know from personal (professional and academic) experience collaborating with others a lot of time and resources have been lost trying to reinvent the wheel, reconstruct the creation of some data set etc. because of lack of documentation around how data was pulled or cleaned. Better documentation and code sharing always seems better. Maybe everyone needs a Github account.

No comments:

Post a Comment