Saturday, May 4, 2013

Data Mining and Predictive Analytics

"Some companies have built their very business on their ability to collect, analyze, and act on data" 
‘Competing on Analytics’. Harvard Bus.Review Jan 2006.

Many businesses make some sort of use of their customer and market data. For some, it’s just a matter of storing and accessing customer data for record keeping and transactional purposes. Others like Google, Netflix, I.B.M. , or the Oakland Athletics make data analysis and analytics a major part of their business model. 

By analyzing past business records, data mining and analytics can help identify patterns that can support decisions that are more cost effective and efficient.  This is the specialty of what has contemporarily been dubbed the data scientist.

Do you really have a need for Data Mining and Predictive Analytics?

There’s only one way to find out how much potential value is buried in your data, and you have to start somewhere. With just a few data mining techniques you can begin to extract insight from your data that you might not otherwise achieve even after hours or years of pouring over lists and and row after row, column after column in excel.

What’s important is that you have someone that can identify the best tool for the task at hand, whether it’s a traditional experimental design involving analysis of variance, a forecast or time series analysis, a predictive model using logistic regression or decision trees, or one of the many other possible data mining tools available to a data scientist.

There are several aspects of data mining and predictive analytics that may be useful to you or your organization including Data Visualization, Predictive Modeling, Text Mining, Social Network Analysis, and Causal Inference. I discuss each of these below.

Data Visualization 

There’s more ways to gain insight from your data than just fancy models or algorithms. Data visualization allows you to transmit information to end users without the sometimes distracting  statistical terminology, complicated equations, or never ending excel sheets.

Created Using R- GoogleVis Package
Flash Enable Browser Required!

Revenues and Outlays 2003-2009

Predictive Modeling 

 My most successful analytics accomplishment to this point involves the development of a predictive models that we use to identify students that have a high risk of dropping out at WKU.  Working with my team, we’ve incorporated my model metrics into our data base/reporting/decision support system so that administrators have access to these high level analytical tools for strategic decision making.  We won an honorable mention from SAS at the recent SAS Global forum for our presentation. (see here for the paper with screenshots). We've since extended this model to predict the probability of enrollment and retention at the application stage as presented at the 2013 SAS Global Forum.

Text Mining

With Twitter, Facebook, email, online forums, open response surveys, customer and reader comments on web pages and news articles etc. there is a lot of information available to companies and organizations in the form of text. Without hiring experts to read through all of the thousands of pages worth of text available and making subjective claims about its meaning, text mining allows us to take otherwise unusable 'qualitative' data and convert it into quantitative measures that we can use for various types of reporting and modeling.

Tools like SAS Text Miner in conjunction with SAS Enterprise Miner are designed specifically to do this type of analysis on a much larger scale. I have used both of these tools in predictive modeling applications. R also has open source tools as well. 

Social Network Analysis

With the rise in the use of social media, data related to social networks is ripe for analysis using techniques from social network analysis and graph theory. According to International Network for Social Network Analysis, ‘Social network analysis is focused on uncovering the patterning of people's interaction’.

Social network analysis (SNA) allows us to answer questions such as who are key  actors in a network? Who are the most influential members of a network? Who seems to be acting on the peripheral? Which connections in the network are most important?  Are there key players bridging connections or information between otherwise disconnected groups? Have policies or other forces changed the overall dynamics/interaction between people in the network (i.e. has the network structure changed in any meaningful way) and does that relate to some other performance outcome or goal?

More specific applications of SNA may include Student Integration and Persistence, Business to Business Supply Chains, Seeding Strategies for Viral Marketing, and Predicting Customer Churn. The open source software R and NetDraw provide  many tools for conducting social network analysis.


‘Using SNA in Predictive Modeling'.
‘Using Twitter to Demonstrate Basic Concepts from Social Network Analysis’
‘An Introduction to Social Network Analysis Using R and Netdraw.’  

Causal Inference

Sometimes we want to do more than just predict outcomes or identify key customer segments. Sometimes we want to know if a current practice or promotion is really having an impact on our business. In the case of an applied research setting, we want to know if a given 'treatment' has a statistically significant impact on an outcome of interest. We know that correlation does not always imply causation. In all of these cases we need statistical methodologies that will allow us to infer causation when appropriate, such as  quasi-experimental designs.

For a very technical look at these methodologies see: Causal Inference Roundup and Quasi-Experimental Design Roundup

For more information:

If you feel you can benefit from the services of a data scientist or have further questions about applied econometrics and analytics please contact me for more information or feel free to visit my blog or selected works where you can find a copy of my CV.

LinkedIn Profile:  (link)    
Selected Works Profile:

No comments:

Post a Comment