“We’re rapidly
entering a world where everything can be monitored and measured,” said Erik
Brynjolfsson, an economist and director of the Massachusetts Institute of Technology’s
Center for Digital Business. “But the big problem is going to be the ability of
humans to use, analyze and make sense of the data.”
“I.B.M., seeing an
opportunity in data-hunting services, created a Business Analytics and
Optimization Services group in April. The unit will tap the expertise of the
more than 200 mathematicians, statisticians and other data analysts in its
research labs — but that number is not enough. I.B.M. plans to retrain or hire
4,000 more analysts across the company.” – From ‘For Today’s Graduate, Just
One Word: Statistics’ – NYT,
Steve Lohr, Aug 2009.
"Some companies
have built their very business on their ability to collect, analyze, and act on
data" – ‘Competing on Analytics’. Harvard Bus.Review Jan 2006.
"The success of
companies like Google and Amazon has encouraged a whole generation of business
leaders to try and replicate their data-driven processes, and left them
searching for data scientists." – Jobs for Data Scientists Explode
Across The Market - NYT,
July 20,2011.
Many businesses make some sort of use of their customer and
market data. For some, it’s just a matter of storing and accessing customer
data for record keeping and transactional purposes. Others like Google, Netflix,
I.B.M. , or the Oakland
Athletics make data analysis and analytics a major part of their business
model.
By analyzing past business records, data mining and
analytics can help identify patterns that can support decisions that are more
cost effective and efficient. This is
the specialty of what has contemporarily been dubbed the data scientist.
What is a data
scientist?
“What sets data
scientists apart from other data workers, including data analysts, is their
ability to create logic behind the data that leads to business decisions.
"Data scientists extract data, formulate models and apply quantitative
analysis in a proactive manner" -Laura
Kelley, Vice President, Modis.
“At least as important [as big data technologies] are the people with the skill set (and the mind-set) to put them to good use…what data scientists do is make discoveries while swimming in data… the dominant trait [of data scientists] is intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested. This often entails the associative thinking that characterizes the most creative scientists in any field….perhaps it’s becoming clear that the word ‘scientist’ fits this emerging role”—Tom Davenport and D.J. Patil
The Data Science Venn Diagram created by Drew Conway helps
in defining data science and the role of a data scientist.
Data scientists not only have expertise in applied
research and statistics, but they are comfortable doing the programming (a.k.a.
hacking) necessary to shape data into a form suitable for analysis. In addition,
data scientists have practical knowledge and expertise in the field or
industry they work in.
Do you need billions
of dollars and a special division staffed with expensive PhD’s to create value
from your business data?
No matter the size of your business or organization, data
mining and analytic expertise can help create value from your data. The size of
your staff and the extent of resources devoted to data science and analytics
may depend on the nature of your organization. However, if you are a small
business, you don’t necessarily have to make the same kind of investments as
Google or Netflix. Competing on
analytics is as much a mindset as it is an investment, and you may be able to
accomplish a lot with current staff or local talent.
Can you afford to
have a ‘data scientist’ on staff? Is it
difficult to find people with this skillset?
Tapping data science talent may not be as difficult as you
think. You don’t necessarily need an expensive PhD statistician, there are many
graduates with similar degrees that possess these skills. Take for instance
agricultural economics:
"The combination
of quantitative training and applied work makes agricultural economics
graduates an extremely well-prepared source of employees for private industry.
That's why American Express has hired over 80 agricultural economists since
1990." - David Edwards, Vice President-International Risk Management,
American Express from Why
Study Applied/Agricultural Economics.
This is also stated quite well in a the NYT article cited
above ‘For Today’s Graduate, Just One Word: Statistics’:
"Though at the fore, statisticians are only a small
part of an army of experts using modern statistical techniques for data
analysis. Computing and numerical skills, experts say, matter far more than
degrees. So the new data sleuths come from backgrounds like economics, computer
science and mathematics."
Perhaps you see a need for data driven decision making in
your business and you want to tap someone with this talent, but you just don’t
think you can justify someone on staff full time to do this. I would first
challenge this notion. You may start with a few questions and answers, and some
predictive modeling, but you will find there is always more data being
generated, more questions, and a constant need to tweak and improve models and
forecasts. However, modern technology makes telecommuting a very attractive
opportunity. You might find that you can
contract with someone with this skill set on an adhoc or as needed basis. There’s
only one way to find out how much potential value is buried in your data, and you have to start somewhere. With just a few data mining
techniques you can begin to extract insight from your data that you might not
otherwise achieve even after hours or years of pouring over lists and and row
after row, column after column in excel.
Data Mining and
Predictive Analytics
First off, there are differences between traditional
statistics and data mining. See ‘Culture
War: Classical Statistics vs. Machine Learning.’ What’s important is that
you have someone that can identify the best tool for the task at hand, whether it’s
a traditional experimental design involving analysis of variance, a forecast or
time series analysis, a predictive model using logistic regression or decision
trees, or one of the many other possible data mining tools available to a data
scientist. In addition, there’s more
ways to gain insight from your data than just fancy models or algorithms. Data
visualization allows you to transmit information to end users without the sometimes distracting statistical terminology, complicated
equations, or never ending excel sheets.
Real World
Applications
My most recent analytics accomplishment to this point
involves the development of a predictive model that we use to identify students
that have a high risk of dropping out at WKU. Working with my team, we’ve
incorporated my model metrics into our data base/reporting/decision support
system so that administrators have access to these high level analytical tools
for strategic decision making. We won an honorable mention from SAS at
the recent SAS Global forum for our presentation. (see here
for the paper with screenshots).
In addition, with the rise of social media and the
digitization of so much data, there are some analytical tools such as text mining and social network analysis that are of
particular interest that might not have seemed feasible in an applied business
setting just a decade ago.
Text Mining
With Twitter, Facebook, email, online forums, open response
surveys, customer and reader comments on web pages and news articles etc. there
is a lot of information available to companies and organizations in the form of
text. Without hiring experts to read through all of the thousands of pages
worth of text available and making subjective claims about its meaning, text
mining allows us to take otherwise unusable 'qualitative' data and convert it
into quantitative measures that we can use for various types of reporting and
modeling. In 'An Intuitive Approach to Text Mining with SAS IML' I demonstrate how the mathematical technique
of singular value decomposition (SVD) can be used to do this.
For example, with text mining techniques based on SVD you
can take the following sample text from comments about finely textured beef:
And quantify it to classify or segment respondents (who may
be customers or clients)
Or use the information in a predictive model:
Tools like SAS
Text Miner in conjunction with SAS
Enterprise Miner are designed specifically to do this type of analysis on a
much larger scale. I have used both of these tools on much larger document
collections and obtained promising results using text topics from SVD in
predictive modeling applications. R
also has open source tools as well. I’ve
used R to text mine tweets related to U.S. political issues related to the debt
ceiling and U.S. budget as well as
tweets related to the term ‘Factory
Farm’ as depicted below:
Social Network
Analysis
With the rise in the use of social media, data related to
social networks is ripe for analysis using techniques from social network
analysis and graph theory. According to International Network for Social
Network Analysis, ‘Social network analysis is focused on uncovering the
patterning of people's interaction’.
Social network analysis (SNA) allows us to answer questions
such as who are key actors in a network? Who are the most influential
members of a network? Who seems to be acting on the peripheral? Which
connections in the network are most important? Are there key players
bridging connections or information between otherwise disconnected groups? Have
policies or other forces changed the overall dynamics/interaction between
people in the network (i.e. has the network structure changed in any meaningful
way) and does that relate to some other performance outcome or goal?
More specific applications of SNA may include Student
Integration and Persistence, Business to Business Supply Chains, Seeding Strategies for Viral Marketing, and Predicting Customer Churn. The open source software R
and NetDraw
provide many tools for conducting social
network analysis. See also ‘Using Twitter to Demonstrate Basic
Concepts from Social Network Analysis’ as well as ‘An
Introduction to Social Network Analysis Using R and Netdraw.’ As some of these examples demonstrate, measures derived from social network analysis
can be very useful in predictive modeling. For a more basic example see ‘Using
SNA in Predictive Modeling'.
For more information:
If you feel you can benefit from the services of a data
scientist or have further questions about applied econometrics and analytics
please contact me for more information or feel free to visit my blog or
selected works where you can find a copy of my CV.
LinkedIn Profile: (link)
Selected Works Profile: http://works.bepress.com/matt_bogard/