Saturday, September 5, 2015

Econometrics, Math, and Machine Learning...what?

In a recent Bloomberg View piece, Noah Smith wrote a piece titled "Economics has a Math Problem" that has caught a lot of attention lately.  There were three interesting arguments or subjects I found interesting in the piece.

#1 In economics, theory often takes a unique role in the determination of causality

"In most applied math disciplines -- computational biology, fluid dynamics, quantitative finance -- mathematical theories are always tied to the evidence. If a theory hasn’t been tested, it’s treated as pure conjecture....Not so in econ. Traditionally, economists have put the facts in a subordinate role and theory in the driver’s seat. "

This alone might seem controversial to some, but to many economists, causality is a theory driven phenomenon, and can never truly be determined by data. I won't expand on this any further. But the point is that often, economists, outside of a purely predictive or forecasting scenario, are interested in answering causal questions, and despite all the work since the credibility revolution in terms of quasi-experimental designs, theory still plays an important role in determine causality and the direction of effects.

 #2 In economics and econometrics, there is a huge emphasis on explaining causal relationships, both theoretically and empirically, but in machine learning the emphasis is prediction, classification, and pattern recognition devoid of theory or data generating processes

"Machine learning is a broad term for a collection of statistical data analysis techniques that identify key features of the data without committing to a theory. To use an old adage, machine learning “lets the data speak.”…machine learning techniques emphasized causality less than traditional economic statistical techniques, or what's usually known as econometrics. In other words, machine learning is more about forecasting than about understanding the effects of policy."

That really gets at what I have written before, about machine learning vs classical inference. (If Noah's article is interesting to you, then I highly recommend the Leo Brieman paper I reference in that post). Its true, at first it might seem that most economists interested in causal inference might sideline machine learning methods for their lack if emphasis on identification of causal effects or a data generating process. One of the biggest differences between the econometric theory most economists have been trained in and the new field of data science is in effect familiarity and use of methods from machine learning. But if they are interested strictly in predictive modeling and forecasting, these methods might be quite appealing. (I've argued before that economists are ripe for being data scientists). As we know, the methods and approaches we take to analyzing our data differ substantially depending on whether we are trying to explain vs. predict.

But then things start to get interesting:

#3 Recent work in econometrics has narrowed the gap between machine learning and econometrics

"But Athey and Imbens have also studied how machine learning techniques can be used to isolate causal effects, which would allow economists to draw policy implications."

I have not actually drilled into the references and details around this but it is interesting. Just thinking about it a little, I recalled that not long ago I worked on a project where I used gradient boosting (a machine learning algorithm) to estimate propensity scores to estimate treatment effects associated with a web ap.

Even one of the masters of metrics and causal inference, Josh Angrist is offering a course titled "Applied Econometrics:Mostly Harmless Big Data" via the MIT open course platform. And for a long time, economist Kenneth Sanford has been following this trend of emphasis on data science and machine learning in econometrics.

Overall, I think it will be interesting to see more examples of applications of machine learning in causal inference. But, when these applications involve big data and the internet of things, economists will really have to test their knowledge of a range of other big data tools that have little to do with building models or doing calculations.

See also:
Analytics vs Causal Inference
Big Data: Don't throw the baby out with the bath water
Propensity Score Weighting: Logistic vs CART vs Boosting vs Random Forests 
Data Cleaning
Got Data? Probably not like your econometrics textbook!
In God we trust, all others show me your code.
Data Science, 10% inspiration, 90% perspiration
 Big Ag Meets Big Data (Part 1 & Part 2)


  1. While I'm certainly glad that we have more/better data and methods for analyzing it, I don't think we should ever forget the lessons from masters of the past. A lot of their work is increasingly relevant as we are increasingly incentivized to think we can engineer policy (whether private or public) using data analysis. Ultimately the things we analyze as economists are too complex for data to deliver what some seem to expect of it (see Hayek).

    I recently read Ronald Coase's paper "The Marginal Cost Controversy." This paper isn't emphasized in grad programs but it serves as a warning: what Coase calls "blackboard economics" can lead us to say things with confidence that are not so certain.

    His interview in 2012 with Russ Roberts is definitely illuminating.

    Arnold Kling, a graduate of MIT, has some interesting thoughts about the mathematization of the profession that's really at the heart of this "data revolution."

    I have some posts on this subject on the blog. Type "blackboard economics" into the search bar on the Farmer Hayek blog and you'll get a few of them!

  2. Thanks for the shout-out! :-)

    to many economists, causality is a theory driven phenomenon, and can never truly be determined by data. I won't expand on this any further.

    I would love to read a follow-up post about this...

  3. Thanks. I really think Levi Russell above does a great job covering this in his comments. I'm still thinking about it, but I really like his stuff over at the Farmer Hayek blog and was kind of hoping he would to a post on this as well.

  4. Thanks Matt.

    Here's the post by Kling on math in the profession:

  5. Fantastic post, Matt!

    I'm an accounting academic who does a lot of applied econometric work, and I've started reading up on the machine learning literature. As you and Noah Smith note, the focus is on prediction rather than estimating the effect of a particular predictor.

    Aside from some basic terminology differences--e.g. I'd never heard of training vs. testing samples until I started reading ML books--I've found most of the techniques quite accessible. And it was heartening to see that deep connections have been established between bread-and-butter econometric techniques like logistic regression and SVM methods.

    The only parts of ML that I'm still trying to get comfortable with are methods that might yield better prediction, but whose outputs aren't as readily interpretable, such as tree-based boosting. It seems like you'd have to feel quite confident that the underlying relationships are stable over time if you're going to use these more less interpretable methods, no?

    But I should already be thankful for ML methods, as they brought me to your blog...I found out about it through a suggested LinkedIn connection, a match developed via a ML-based algorithm. :)

    George Batta