Saturday, October 6, 2012

Economists as Data Scientists


Economists are Like Scientists

In Greg Mankiw’s principles of economics textbooks he proposes that economists are like scientists in that they develop theories and subsequently gather data to test those theories empirically. Econometrics is the empirical aspect of economics.  In general econometrics is focused on hypothesis testing of causes and effects. The goal is typically deriving estimators with desirable properties appropriate for making inferences. As described by Tim Harford:

" econometricians set themselves the task of figuring out past relationships. Have charter schools improved educational standards? Did abortion liberalisation reduce crime? What has been the impact of immigration on wages?"

Some of the tools of econometrics include linear regression, logit/probit models, instrumental variables, and time series.

Data Scientists

As presented by Drew Conway, data science is a combination of hacking skills, math and statistics knowledge, and substantive expertise.

 A recent post via the Harvard Business review blog gives some practical examples of the capabilities of a data scientist:

"They can suck data out of a server log, a telecom billing file, or the alternator on a locomotive, and figure out what the heck is going on with it. They create new products and services for customers. They can also interface with carbon-based lifeforms — senior executives, product managers, CTOs, and CIOs. You need them." - Can You Live Without a Data Scientist? - Harvard Business Review

 Hal Varian, Google's Chief Economist describes the skills of a data scientist as follows:
"Database and data manipulation or how to shuffle data around and move things from place to place; statistics and statistical analysis; machine learning; visualization, or how to present data in a meaningful way; and communication or being able to describe what’s going on."
While data scientists certainly rely on a strong foundation in statistics, and may in fact utilize some of the same tools of inferential statistics used by econometricians, data scientists most often will follow a different path. As described by Leo Brieman:

"There are two cultures in the use of statistical modeling to reach conclusions from data”

The traditional statistical/econometric culture:

"assumes that the data are generated by a given stochastic data model."

vs. the machine learning/data mining culture:

"uses algorithmic models and treats the data mechanism as unknown."

Because of the nature of the data and the problems solved by data scientists, they very often use algorithmic methods to obtain desired solutions. Typically this is not a situation that calls for the types of estimators with desirable properties leading to empirically sound inferences sought by econometricians, but often the concern is simply making accurate predictions or discovering informative patterns in the data.

Economist Scott Nicholson (Chief Data Scientist at Accretive Health and formerly at LinkedIn) comments on the differences between economists and data scientists: 

 "In terms of applied work, economists are primarily concerned with establishing causation. This is key to understanding what influences individual decision-making, how certain economic and public policies impact the world, and tells a much clearer story of the effects of incentives. With this in mind, economists care much less about the accuracy of the predictions from their econometric models than they do about properly estimating the coefficients, which gets them closer to understanding causal effects.At Strata NYC 2011, I summed this up by saying: If you care about prediction, think like a computer scientist, if you care about causality, think like an economist."
The algorithms used by data scientists come from the machine learning and data mining paradigm, and often include neural networks, decision trees, support vector machines, association rules, and others.

These approaches may not be very familiar to economists, but their training in statistics and mathematics make these techniques very accessible.  Take for instance logistic regression. This technique is very familiar to most economists, and is in fact used often times by data scientists to solve classification problems.  However, as Peter Kennedy describes in A Guide to Econometrics, neural networks (with logistic activation functions) can be thought of as a weighted average of logit functions.  And, if the econometrician understands how logistic regression parameters are estimated (based on maximum likelihood with estimation implemented via Newton’s Method) it’s not that difficult to grasp gradient descent or even the backpropogation algorithm used in neural networks.

Similarly, as econometrics is written in the language of calculus and linear algebra, so is machine learning. (for more details see the popular machine learning text Elements of Statistical Learning: Data Mining, Inference, and Prediction).  Some of the mathematical concepts used in advanced microeconomic theory (inner products, separating and supporting hyperplanes,  and quadratic programming for example) are also very useful when it comes to understanding support vector machines.

In conclusion, most economists trained in econometrics have two of the three elements that comprise data science; substantive expertise (economic theory) and knowledge of mathematics and statistics. Supplementing their quantitative skills with hacking skills (data management, manipulation, cleaning, and loop and array processing, etc. via a language like SAS/SQL, MATLAB, or R) and familiarity with machine learning algorithms would open the door for many trained in economics and statistics to employ their skills as data scientists.


 'Statistical Modeling: The Two Cultures' by L. Breiman (Statistical Science
2001, Vol. 16, No. 3, 199–231) in Culture War: Classical Statistics vs. Machine Learning. Matt Bogard. Econometric Sense.

Data Scientist: The Sexiest Job of the 21st Century. HBR. 

Exclusive: Scott Nicholson Interview: Data Science, Economics, Weather, LinkedIn, and Healthcare  

 Google’s chief economist examines the data scientist factor 

No comments:

Post a Comment