Economists are Like
Scientists
In Greg Mankiw’s principles of economics textbooks he proposes
that economists are like scientists in that they develop theories and
subsequently gather data to test those theories empirically. Econometrics is
the empirical aspect of economics.
In general econometrics is focused on hypothesis testing of causes and
effects. The goal is typically deriving estimators with desirable properties
appropriate for making inferences. As described by Tim
Harford:
" econometricians set themselves the task of figuring out past relationships. Have charter schools improved educational standards? Did abortion liberalisation reduce crime? What has been the impact of immigration on wages?"
" econometricians set themselves the task of figuring out past relationships. Have charter schools improved educational standards? Did abortion liberalisation reduce crime? What has been the impact of immigration on wages?"
Some of the tools of econometrics include linear regression,
logit/probit models, instrumental variables, and time series.
Data Scientists
As presented by Drew Conway, data science is a
combination of hacking skills, math and statistics knowledge, and substantive
expertise.
A recent post via the Harvard Business
review blog gives some practical examples of the capabilities of a data
scientist:
"They can suck
data out of a server log, a telecom billing file, or the alternator on a
locomotive, and figure out what the heck is going on with it. They create new
products and services for customers. They can also interface with carbon-based
lifeforms — senior executives, product managers, CTOs, and CIOs. You need
them." - Can
You Live Without a Data Scientist? - Harvard Business Review
Hal Varian, Google's Chief Economist describes the skills of a data scientist as follows:
"Database
and data manipulation or how to shuffle data around and move things
from place to place; statistics and statistical analysis; machine
learning; visualization, or how to present data in a meaningful way; and
communication or being able to describe what’s going on."
While data scientists certainly rely on a strong foundation in statistics, and may in fact utilize some of the same tools of inferential statistics used by econometricians, data scientists most often will follow a different path. As described by Leo Brieman:
While data scientists certainly rely on a strong foundation in statistics, and may in fact utilize some of the same tools of inferential statistics used by econometricians, data scientists most often will follow a different path. As described by Leo Brieman:
"There are two
cultures in the use of statistical modeling to reach conclusions from data”
The traditional statistical/econometric culture:
"assumes that the
data are generated by a given stochastic data model."
vs. the machine learning/data mining culture:
"uses algorithmic
models and treats the data mechanism as unknown."
Because of the nature of the data and the problems solved by
data scientists, they very often use algorithmic methods to obtain desired
solutions. Typically this is not a situation that calls for the types of estimators
with desirable properties leading to empirically sound inferences sought by
econometricians, but often the concern is simply making accurate predictions or
discovering informative patterns in the data.
Economist Scott Nicholson (Chief Data Scientist at Accretive Health and formerly at LinkedIn) comments on the differences between economists and data scientists:
Economist Scott Nicholson (Chief Data Scientist at Accretive Health and formerly at LinkedIn) comments on the differences between economists and data scientists:
"In terms of applied work, economists are primarily concerned with establishing causation. This is key to understanding what influences individual decision-making, how certain economic and public policies impact the world, and tells a much clearer story of the effects of incentives. With this in mind, economists care much less about the accuracy of the predictions from their econometric models than they do about properly estimating the coefficients, which gets them closer to understanding causal effects.At Strata NYC 2011, I summed this up by saying: If you care about prediction, think like a computer scientist, if you care about causality, think like an economist."
The algorithms used by data scientists come from the machine learning and data mining paradigm, and often include neural networks, decision trees, support vector machines, association rules, and others.
These approaches may not be very familiar to economists, but
their training in statistics and mathematics make these techniques very
accessible. Take for instance logistic
regression. This technique is very familiar to most economists, and is in
fact used often times by data scientists to solve classification problems. However, as Peter Kennedy describes in A Guide to Econometrics, neural
networks (with logistic activation functions) can be thought of as a weighted
average of logit functions. And,
if the econometrician understands how logistic regression parameters are
estimated (based on maximum likelihood with estimation implemented via Newton’s
Method) it’s not that difficult to grasp gradient
descent or even the backpropogation
algorithm used in neural networks.
Similarly, as econometrics is written in the language of
calculus and linear algebra, so is machine learning. (for more details see the
popular machine learning text Elements of
Statistical Learning: Data Mining, Inference, and Prediction). Some of the mathematical concepts used in
advanced microeconomic theory (inner products, separating and supporting
hyperplanes, and quadratic
programming for example) are also very useful when it comes to understanding support vector machines.
In conclusion, most economists trained in econometrics have
two of the three elements that comprise data science; substantive expertise
(economic theory) and knowledge of mathematics and statistics. Supplementing
their quantitative skills with hacking skills (data management, manipulation,
cleaning, and loop and array processing, etc. via a language like SAS/SQL,
MATLAB, or R) and familiarity with machine learning algorithms would open the
door for many trained in economics and statistics to employ their skills as
data scientists.
References:
'Statistical Modeling: The Two Cultures' by L. Breiman
(Statistical Science
2001, Vol. 16, No. 3, 199–231) in Culture War: Classical Statistics vs. Machine Learning. Matt Bogard. Econometric Sense. http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html
2001, Vol. 16, No. 3, 199–231) in Culture War: Classical Statistics vs. Machine Learning. Matt Bogard. Econometric Sense. http://econometricsense.blogspot.com/2011/01/classical-statistics-vs-machine.html
Defining Data Science: http://blog.revolutionanalytics.com/2011/09/data-science-a-literature-review.html
Data Scientist: The Hottest Job You Haven't Heard of :
http://jobs.aol.com/articles/2011/08/10/data-scientist-the-hottest-job-you-havent-heard-of/
Data Scientist: The Hottest Job You Haven't Heard of :
http://jobs.aol.com/articles/2011/08/10/data-scientist-the-hottest-job-you-havent-heard-of/
Data Scientist: The Sexiest Job of the 21st Century. HBR. http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
Exclusive: Scott Nicholson Interview: Data Science, Economics, Weather, LinkedIn, and Healthcare
http://www.kdnuggets.com/2012/08/exclusive-scott-nicholson-interview-economics-weather-linkedin-healthcare.html
Google’s chief economist examines the data scientist factor
http://searchbusinessanalytics.techtarget.com/news/1280099135/Googles-chief-economist-examines-the-data-scientist-factor
No comments:
Post a Comment