"When
you don't have to code your own estimators, you probably won't
understand what you're doing. I'm not saying that you definitely won't,
but push-button analyses make it easy to compute numbers that you are
not equipped to interpret."
Then I said:
I Agree. Statistics is a language best communicated and understood via code vs. a point and click GUI.
But I revised the post and restated:
I agree that statistics is a language best communicated and understood via code vs. a point and click GUI.
I've
been thinking a lot about this, and considering some insight from Rick
Wicklin in the comments to that post. What I should actually say is that *personally* I don't feel like I
understand an estimator as well until I've actually coded it (as an
algorithm not just submitting a command to a software package) or at
least made some attempt to implement it in some simplified way, or I get
some idea of how it could be coded if my coding skills were up to the challenge.
So, by that measure I understand some estimators better than others and trying to better understand estimators in this way is really the
purpose of this blog.
But
I've thought a little more about the first quote. *IF* I understand
correctly how SAS/R/STATA/SPSS etc. works, regardless if you are pointing and
clicking via a GUI interface or submitting canned routines at the
command line, *BOTH* are wrappers for the heavy lifting and actual
statistical programming abstracted behind the scenes by developers.
Both command line and GUI environments make it pretty easy to
get results you may or may not correctly interpret, or simply to apply the
wrong test in the wrong situation.
But, you can also really get into trouble actually coding your estimator
(maybe using SAS IML or R or Octave). If you don't know what you are
doing and you click through a regression in SPSS or submit a PROC in
SAS, at least the estimates will be correct. If you make a syntax error the code likely just won't run and the log may even help you out. The application of the
correct statistical test or interpretation is up to you. But you can make a mistake coding your own estimator and not even realize it. Efficiency, repeatability, resource requirements etc. all factor in too.
As
far as 'signaling' goes, now that I've thought more about it, a screen
full of code certainly may give the appearance of a more sophisticated
analysis and may even give one a false sense of confidence in the results i.e. the signal for sophistication or quality may be mixed at best.
To me, a statistics package is not just its code, it’s also its community, it’s what people do with it.
I can relate to the community aspect. On campus, within academia we have our different communities of users (Stata, SPSS, Excel, SAS, Mathematica ,and some R) and I think Rick's comments on the previous post are informative about academic and business communities. I think lots of times, social identity theory begins to play out; we get really comfortable within our community and tend to pigeonhole and short change other tools, and even worse project those perceptions onto the users of those tools. I've been guilty of this and Rick helped me realize it.
**************************
Note: There can definitely be benefits to coding your own estimators. If you are a coder and use SAS, I highly recommend Rick Wicklin's The Do Loop where he has produced a number of excellent posts explaining the nuts and bolts of coding your own estimators and gets behind the scenes of a vast array of concepts (like the
power method for computing only the largest eigenvalue and this tip on
NOT using macros to code a simulation. See also
12 Tips for SAS Statistical Programmers from 2012.