Tuesday, May 1, 2012

SAS Global Forum Followup

Last week I attended SAS Global Forum held in Orlando, Florida. It was a great opportunity to meet some very interesting SAS users, contributors, and developers like David Dickey and  Rick Wicklin. I also had the opportunity to co-present a paper that my colleagues and I wrote that demonstrated one of the great things about SAS, the integrated interoperable enterprise wide platform SAS products provide for business intelligence that surpasses routine reporting and statistical analysis.  There were also several sessions that I attended that I would like to highlight and others I missed.

SAS Global Forum is in many ways like a crash course post graduate education crammed into just about 3 days of presentations.  It is impossible to attend all of the sessions of interest, and even though the papers are all posted online, it is still possible to miss something if you are not careful. But that is what is so great about it! It has so much to offer. My coworkers and I plan to compare notes soon to cover more ground.

One thing about SAS Global Forum is you have to pace yourself. You will come back from this conference with lots of ideas. You can't implement them all, at least not at once. Leftover from last year is utilizing the %GetTweet macro for social network analysis and text mining. And,  I'm only just now scratching the surface of utilizing copulas in SAS, (another paper topic from last year).

This year, my favorite presentation was Handling Missing Data by Maximum Likelihood, given by Paul Allison. Not only is Allison a prolific writer in the area of statistics and SAS, but everything I have ever read from Allison is very clear, concise,and easy to understand. For the most part, I have dealt with missing data imputation in the SAS Enterprise Miner Environment. Being designed for large data sets, multiple imputation is not a default option in the SAS EM environment. It does offer some attractive options beyond mean imputation, including M-estimators. However, I could imagine using the maximum likelihood methods Allison discusses in projects outside of the SAS Enterprise Miner Environment, or prior to importing the data for a project into SAS Enterprise Miner (or even via a code node within the SAS Enterprise Miner Environment).

I attended several sessions related to text mining. My favorites were Analyzing sentiments in Tweets about Wal-Mart’s gender discrimination lawsuit verdict using SAS® Text Miner,  and Classification of Customers’ Textual Responses via Application of Topic Mining .   Both of these papers (as well as last year's %GetTweet article and several more)  were coauthored by Dr. Goutam Chakraborty of the Spears School of Business at Oklahoma State University. Dr. Chakraborty (who also founded the SAS and OSU Data mining and Business Analytics certificate programs)  always had interesting commentary during the question and answer segments and the students always presented very interesting text mining applications. There was also a very good text mining poster, Investigating Host Plant Resistance to Aphid Feeding through SAS® Text Miner, also coauthored by Dr. Chakraborty.

Three other papers that I missed due to scheduling conflicts included PROPENSITY SCORE ANALYSIS AND ASSESSMENT OF PROPENSITY SCORE APPROACHES USING SAS® PROCEDURES , Your “Survival” Guide to Using Time‐Dependent Covariates and Use of Cutoff and SAS Code Nodes in SAS® Enterprise Miner™ to Determine Appropriate Probability Cutoff Point for Decision Making with Binary Target Models . I have explored the concept of matching before, and even got some interesting comments from Andrew Gelman and Joshua Angrist regarding my interpretation, but have yet implemented this in a project. (although there are plenty of applications). I have already began a survival analysis project, and have recently spent some time investigating how to reference time dependent covariates in SAS.   In terms of the paper on probability cutoffs, I have tried to figure this out in SAS Enterprise Miner before, but have not really spent much time with it.  I really need to read all three papers above!

One other paper that I missed, that my wife might be interested in was  Using SAS® and Zip Codes to Create a Nationwide First Responders Directory.

Two other great sessions that I attended incorporated LaTeX and PERL regular expressions.  I'm not sure if I will actually utilize the regular expressions, but the LaTeX capability might enable our office to not only produce reproducible research, but also produce really cool training documents that illustrate SAS code or possibly allow me to create documents for my statistics students demonstrating SAS applications.

I didn't attend any sessions utilizing SAS IML, but I hope to utilize it to develop social network analysis metrics, such as eigenvector centrality. I think IML is a very powerful way to extend the capability of SAS. While I didn't attend a session for this, I truly enjoyed sitting down with Rick Wicklin at the demo hall and having him code through some simulations and extract eigenvectors! Still looking forward to a post from him on how to extract only the leading eigenvector from a large symmetric matrix in IML!

Finally, as a statistics instructor, I'm becoming more interested in utilizing SAS On Demand for educators in my classes. Unfortunately, that's another paper presentation that I missed!

SAS Global Forum is a truly awesome and inspiring experience! Its a lot to take in, and well worth the cost and effort of attending. (PS don't forget the upcoming Analytics 2012 conference also hosted by SAS!)

2 comments:

  1. I enjoyed discussion SAS/IML with you as well. The blog post you mentioned is available at http://blogs.sas.com/content/iml/2012/05/09/the-power-method/

    ReplyDelete
  2. Thanks. It will be extremely helpful. I was also just referencing your post on generating random numbers today! http://blogs.sas.com/content/iml/2011/08/24/how-to-generate-random-numbers-in-sas/

    ReplyDelete