In my previous post I quoted the following:
"When
you don't have to code your own estimators, you probably won't
understand what you're doing. I'm not saying that you definitely won't,
but push-button analyses make it easy to compute numbers that you are
not equipped to interpret."
Then I said:
I Agree. Statistics is a language best communicated and understood via code vs. a point and click GUI.
But I revised the post and restated:
Then I said:
I Agree. Statistics is a language best communicated and understood via code vs. a point and click GUI.
But I revised the post and restated:
I agree that statistics is a language best communicated and understood via code vs. a point and click GUI.
I've
been thinking a lot about this, and considering some insight from Rick
Wicklin in the comments to that post. What I should actually say is that *personally* I don't feel like I
understand an estimator as well until I've actually coded it (as an
algorithm not just submitting a command to a software package) or at
least made some attempt to implement it in some simplified way, or I get
some idea of how it could be coded if my coding skills were up to the challenge.
So, by that measure I understand some estimators better than others and trying to better understand estimators in this way is really the purpose of this blog.
But
I've thought a little more about the first quote. *IF* I understand
correctly how SAS/R/STATA/SPSS etc. works, regardless if you are pointing and
clicking via a GUI interface or submitting canned routines at the
command line, *BOTH* are wrappers for the heavy lifting and actual
statistical programming abstracted behind the scenes by developers.
Both command line and GUI environments make it pretty easy to
get results you may or may not correctly interpret, or simply to apply the
wrong test in the wrong situation.
As
far as 'signaling' goes, now that I've thought more about it, a screen
full of code certainly may give the appearance of a more sophisticated
analysis and may even give one a false sense of confidence in the results i.e. the signal for sophistication or quality may be mixed at best.
In a recent blog post, Andrew Gelman states (in the context of this same software signal discussion on his blog):
To me, a statistics package is not just its code, it’s also its community, it’s what people do with it.
I can relate to the community aspect. On campus, within academia we have our different communities of users (Stata, SPSS, Excel, SAS, Mathematica ,and some R) and I think Rick's comments on the previous post are informative about academic and business communities. I think lots of times, social identity theory begins to play out; we get really comfortable within our community and tend to pigeonhole and short change other tools, and even worse project those perceptions onto the users of those tools. I've been guilty of this and Rick helped me realize it.
**************************
Note: There can definitely be benefits to coding your own estimators. If you are a coder and use SAS, I highly recommend Rick Wicklin's The Do Loop where he has produced a number of excellent posts explaining the nuts and bolts of coding your own estimators and gets behind the scenes of a vast array of concepts (like the power method for computing only the largest eigenvalue and this tip on NOT using macros to code a simulation. See also 12 Tips for SAS Statistical Programmers from 2012.
No comments:
Post a Comment