Thursday, October 18, 2012

Get a Data Science Attitude

Do the following terms mean anything to you?

load balance toggle join index normalize key 

If you are a statistician and aspiring data scientist they should. If not this is one area where you should expand your knowledge base. In her article 'Being a data scientist is as much about IT as it is analysis' Carla Gentry explains why. 

"With knowledge of the client's IT setup from a data management/quality perspective, you'll be equipped to handle most situations you run into when dealing with data, even if the Architect and Programmer are out sick. Your professional knowledge is going to be a big help in getting the assignment or job complete."

If all you want is a cook to order data set from IT so you can run a regression, that's not the attitude or the skillset that employers have in mind when they seek out data scientists. There's plenty of jobs in academia for traditional statisticians, but that's not what Hal Varian was talking about when he said that the sexy job in the next 10 years will be statisticians. 

This reminds me of an article I read not long ago about building data science teams:

"Most of the data was available online, but due to its size, the data was in special formats and spread out over many different systems. To make that data useful for my research, I created a system that took over every computer in the department from 1 AM to 8 AM. During that time, it acquired, cleaned, and processed that data. Once done, my final dataset could easily fit in a single computer's RAM. And that's the whole point. The heavy lifting was required before I could start my research. Good data scientists understand, in a deep way, that the heavy lifting of cleanup and preparation isn't something that gets in the way of solving the problem: it is the problem."

