Tuesday, January 8, 2013

Decomposition: The Statistics Software Signal


Decomposition: The Statistics Software Signal


"When you don't have to code your own estimators, you probably won't understand what you're doing. I'm not saying that you definitely won't, but push-button analyses make it easy to compute numbers that you are not equipped to interpret."

I agree that statistics is a language best communicated and understood via code vs. a point and click GUI.

However, particularly interesting is his view of how the use of a given software package may relate to the quality of research:

"SPSS: You love using your mouse and discovering options using menus. You are nervous about writing code and probably manage your data in Microsoft Excel." (see the linked article for similar remarks)

 To be fair, STATA, SPSS, SAS and R have coding environments, and as a user of both SAS and R products I don't see why using PROC REG in SAS is any less sophisticated than the 'lm' function in R. Nor do I see any difference in coding an estimator or algorithm in R vs. SAS IML.

In fact, there has been a long running discussion for over a year now on SAS vs. R on LinkedIn and in my opinion it all it has established is that R certainly provides a powerful software solution for many researchers and businesses. 

It would be interesting to quantify and test Taylor's theory.

UPDATE: see You say Stata I Say SAS: software signaling and social identity theory.


  1. I find Taylor's stereotypes to be insulting. I would wager that he has not met many sophisticated users of the software that he disparages, and I am surprised that you are proliferating his views.

    I talk with many SAS users and R users. The main difference is that most SAS users have something that they want to accomplish as part of their job. They want software that is fast and reliable that will enable them to analyze their data with minimal pain so that they can go on to their next task. For them, the software is a means to an end. Most don't have the time to implement new statistical methods, even if they wanted to.

    In contrast, the R users that I know are more concerned about the methodology and computation. For them, the software analysis IS the end.

    There is bias in my sample: 80% of the SAS users I know are in companies, whereas most R users that I know are in universities.

    As for not having respect for researchers who use SPSS, you should look up the CV for Leland Wilkinson, Fellow of the ASA and of the AAAS, and originator of the "grammar of graphics" model upon which ggplot is based.

  2. Hi Rick!

    Maybe its more from a teaching standpoint only, but I think there are definitely major advantages at least learning and understanding the language of statistics via coding vs. pointing and clicking. (I just think as a student I would have been much better off learning SAS vs. GRETL). Plus, at least being comfortable coding opens up more doors if you hit a wall with a given software solution. The difficult challenge is not letting the software get in the way of learning the theory.

    I'm growing on the use of GUI interfaces particularly as a SAS Enterprise Miner customer, because it does exactly as you say in that it is fast and reliable and it allows me to complete a project that is both repeatable and conducive to later tweaks and modifications without spending a whole day otherwise reviewing hundreds or thousands of lines of code to 'see where what we did before'. Its a prime example of how its sometimes just more efficient to stand on the shoulders of giants (developers like you for instance) than to do everything from scratch. I don't think the fact that I don't code my own backpropogation algorithm for Neural Nets implies that I don't understand them.

    However, as you know,I like to get behind the curtain of some of these algorithms when I get a chance via R or SAS IML.