Thursday, January 10, 2013

You say Stata I say SAS: Software Signaling and Social Identity Theory

In my previous post I quoted the following: 

"When you don't have to code your own estimators, you probably won't understand what you're doing. I'm not saying that you definitely won't, but push-button analyses make it easy to compute numbers that you are not equipped to interpret." 

Then I said: 

I Agree. Statistics is a language best communicated and understood via code vs. a point and click GUI.

But I revised the post and restated:

I agree that statistics is a language best communicated and understood via code vs. a point and click GUI. 

I've been thinking a lot about this, and considering some insight from Rick Wicklin in the comments to that post. What I should actually say is that  *personally* I don't feel like I understand an estimator as well until I've actually coded it (as an algorithm not just submitting a command to a software package) or at least made some attempt to implement it in some simplified way, or I get some idea of how it could be coded if my coding skills were up to the challenge. So, by that measure I understand some estimators better than others and trying to better understand estimators in this way is really the purpose of this blog. 

But I've thought a little more about the first quote. *IF* I understand correctly how SAS/R/STATA/SPSS etc. works, regardless if you are pointing and clicking via a GUI interface or submitting canned routines at the command line, *BOTH* are wrappers for the heavy lifting and actual statistical programming abstracted behind the scenes by developers. Both command line and GUI environments  make it pretty easy to get results you may or may not correctly interpret, or simply to apply the wrong test in the wrong situation.
But, you can also really get into trouble actually coding your estimator (maybe using SAS IML or R or Octave). If you don't know what you are doing and you click through a regression in SPSS or submit a PROC in SAS, at least the estimates will be correct. If you make a syntax error the code likely just won't run and the log may even help you out.  The application of the correct statistical test or interpretation is up to you. But you can make a mistake coding your own estimator and not even realize it. Efficiency, repeatability, resource requirements etc. all factor in too.

As far as 'signaling' goes, now that I've thought more about it, a screen full of code certainly may give the appearance of a more sophisticated analysis and may even give one a false sense of confidence in the results i.e. the signal for sophistication or quality may be mixed at best. 

In a recent blog post, Andrew Gelman states (in the context of this same software signal discussion on his blog): 
  To me, a statistics package is not just its code, it’s also its community, it’s what people do with it.

I can relate to the community aspect.  On campus, within academia we have our different communities of users (Stata, SPSS, Excel, SAS,  Mathematica ,and some R) and I think Rick's comments on the previous post are informative about academic and business communities.  I think lots of times, social identity theory begins to play out;  we get really comfortable within our community and  tend to pigeonhole and short change other tools, and even worse project those perceptions onto the users of those tools.  I've been guilty of this and Rick helped me realize it.

Note: There can definitely be benefits to coding your own estimators. If you are a coder  and use SAS, I highly recommend  Rick Wicklin's The Do Loop where he has produced a number of excellent posts explaining the nuts and bolts of coding your own estimators and gets behind the scenes of a vast array of concepts (like the power method for computing only the largest eigenvalue and this tip on NOT using macros to code a simulation.  See also 12 Tips for SAS Statistical Programmers from 2012.

No comments:

Post a Comment