“We’re rapidly entering a world where everything can be monitored and measured,” said Erik Brynjolfsson, an economist and director of the Massachusetts Institute of Technology’s Center for Digital Business. “But the big problem is going to be the ability of humans to use, analyze and make sense of the data.”
“I.B.M., seeing an opportunity in data-hunting services, created a Business Analytics and Optimization Services group in April. The unit will tap the expertise of the more than 200 mathematicians, statisticians and other data analysts in its research labs — but that number is not enough. I.B.M. plans to retrain or hire 4,000 more analysts across the company.” – From ‘For Today’s Graduate, Just One Word: Statistics’ – NYT, Steve Lohr, Aug 2009.
"Some companies have built their very business on their ability to collect, analyze, and act on data" – ‘Competing on Analytics’. Harvard Bus.Review Jan 2006.
"The success of companies like Google and Amazon has encouraged a whole generation of business leaders to try and replicate their data-driven processes, and left them searching for data scientists." – Jobs for Data Scientists Explode Across The Market - NYT, July 20,2011.
Many businesses make some sort of use of their customer and market data. For some, it’s just a matter of storing and accessing customer data for record keeping and transactional purposes. Others like Google, Netflix, I.B.M. , or the Oakland Athletics make data analysis and analytics a major part of their business model.
By analyzing past business records, data mining and analytics can help identify patterns that can support decisions that are more cost effective and efficient. This is the specialty of what has contemporarily been dubbed the data scientist.
What is a data scientist?
“What sets data scientists apart from other data workers, including data analysts, is their ability to create logic behind the data that leads to business decisions. "Data scientists extract data, formulate models and apply quantitative analysis in a proactive manner" -Laura Kelley, Vice President, Modis.
“At least as important [as big data technologies] are the people with the skill set (and the mind-set) to put them to good use…what data scientists do is make discoveries while swimming in data… the dominant trait [of data scientists] is intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested. This often entails the associative thinking that characterizes the most creative scientists in any field….perhaps it’s becoming clear that the word ‘scientist’ fits this emerging role”—Tom Davenport and D.J. Patil
The Data Science Venn Diagram created by Drew Conway helps in defining data science and the role of a data scientist.
Data scientists not only have expertise in applied research and statistics, but they are comfortable doing the programming (a.k.a. hacking) necessary to shape data into a form suitable for analysis. In addition, data scientists have practical knowledge and expertise in the field or industry they work in.
Do you need billions of dollars and a special division staffed with expensive PhD’s to create value from your business data?
No matter the size of your business or organization, data mining and analytic expertise can help create value from your data. The size of your staff and the extent of resources devoted to data science and analytics may depend on the nature of your organization. However, if you are a small business, you don’t necessarily have to make the same kind of investments as Google or Netflix. Competing on analytics is as much a mindset as it is an investment, and you may be able to accomplish a lot with current staff or local talent.
Can you afford to have a ‘data scientist’ on staff? Is it difficult to find people with this skillset?
Tapping data science talent may not be as difficult as you think. You don’t necessarily need an expensive PhD statistician, there are many graduates with similar degrees that possess these skills. Take for instance agricultural economics:
"The combination of quantitative training and applied work makes agricultural economics graduates an extremely well-prepared source of employees for private industry. That's why American Express has hired over 80 agricultural economists since 1990." - David Edwards, Vice President-International Risk Management, American Express from Why Study Applied/Agricultural Economics.
This is also stated quite well in a the NYT article cited above ‘For Today’s Graduate, Just One Word: Statistics’:
"Though at the fore, statisticians are only a small part of an army of experts using modern statistical techniques for data analysis. Computing and numerical skills, experts say, matter far more than degrees. So the new data sleuths come from backgrounds like economics, computer science and mathematics."
Perhaps you see a need for data driven decision making in your business and you want to tap someone with this talent, but you just don’t think you can justify someone on staff full time to do this. I would first challenge this notion. You may start with a few questions and answers, and some predictive modeling, but you will find there is always more data being generated, more questions, and a constant need to tweak and improve models and forecasts. However, modern technology makes telecommuting a very attractive opportunity. You might find that you can contract with someone with this skill set on an adhoc or as needed basis. There’s only one way to find out how much potential value is buried in your data, and you have to start somewhere. With just a few data mining techniques you can begin to extract insight from your data that you might not otherwise achieve even after hours or years of pouring over lists and and row after row, column after column in excel.
Data Mining and Predictive Analytics
First off, there are differences between traditional statistics and data mining. See ‘Culture War: Classical Statistics vs. Machine Learning.’ What’s important is that you have someone that can identify the best tool for the task at hand, whether it’s a traditional experimental design involving analysis of variance, a forecast or time series analysis, a predictive model using logistic regression or decision trees, or one of the many other possible data mining tools available to a data scientist. In addition, there’s more ways to gain insight from your data than just fancy models or algorithms. Data visualization allows you to transmit information to end users without the sometimes distracting statistical terminology, complicated equations, or never ending excel sheets.
Real World Applications
My most recent analytics accomplishment to this point involves the development of a predictive model that we use to identify students that have a high risk of dropping out at WKU. Working with my team, we’ve incorporated my model metrics into our data base/reporting/decision support system so that administrators have access to these high level analytical tools for strategic decision making. We won an honorable mention from SAS at the recent SAS Global forum for our presentation. (see here for the paper with screenshots).
In addition, with the rise of social media and the digitization of so much data, there are some analytical tools such as text mining and social network analysis that are of particular interest that might not have seemed feasible in an applied business setting just a decade ago.
With Twitter, Facebook, email, online forums, open response surveys, customer and reader comments on web pages and news articles etc. there is a lot of information available to companies and organizations in the form of text. Without hiring experts to read through all of the thousands of pages worth of text available and making subjective claims about its meaning, text mining allows us to take otherwise unusable 'qualitative' data and convert it into quantitative measures that we can use for various types of reporting and modeling. In 'An Intuitive Approach to Text Mining with SAS IML' I demonstrate how the mathematical technique of singular value decomposition (SVD) can be used to do this.
For example, with text mining techniques based on SVD you can take the following sample text from comments about finely textured beef:
And quantify it to classify or segment respondents (who may be customers or clients)
Or use the information in a predictive model:
Tools like SAS Text Miner in conjunction with SAS Enterprise Miner are designed specifically to do this type of analysis on a much larger scale. I have used both of these tools on much larger document collections and obtained promising results using text topics from SVD in predictive modeling applications. R also has open source tools as well. I’ve used R to text mine tweets related to U.S. political issues related to the debt ceiling and U.S. budget as well as tweets related to the term ‘Factory Farm’ as depicted below:
Social Network Analysis
With the rise in the use of social media, data related to social networks is ripe for analysis using techniques from social network analysis and graph theory. According to International Network for Social Network Analysis, ‘Social network analysis is focused on uncovering the patterning of people's interaction’.
Social network analysis (SNA) allows us to answer questions such as who are key actors in a network? Who are the most influential members of a network? Who seems to be acting on the peripheral? Which connections in the network are most important? Are there key players bridging connections or information between otherwise disconnected groups? Have policies or other forces changed the overall dynamics/interaction between people in the network (i.e. has the network structure changed in any meaningful way) and does that relate to some other performance outcome or goal?
More specific applications of SNA may include Student Integration and Persistence, Business to Business Supply Chains, Seeding Strategies for Viral Marketing, and Predicting Customer Churn. The open source software R and NetDraw provide many tools for conducting social network analysis. See also ‘Using Twitter to Demonstrate Basic Concepts from Social Network Analysis’ as well as ‘An Introduction to Social Network Analysis Using R and Netdraw.’ As some of these examples demonstrate, measures derived from social network analysis can be very useful in predictive modeling. For a more basic example see ‘Using SNA in Predictive Modeling'.
For more information:
If you feel you can benefit from the services of a data scientist or have further questions about applied econometrics and analytics please contact me for more information or feel free to visit my blog or selected works where you can find a copy of my CV.
LinkedIn Profile: (link)
Selected Works Profile: http://works.bepress.com/matt_bogard/