In a recent Bloomberg View piece, Noah Smith wrote a piece titled "Economics has a Math Problem" that has caught a lot of attention lately. There were three interesting arguments or subjects I found interesting in the piece.
#1 In economics, theory often takes a unique role in the determination of causality
"In most applied math disciplines -- computational biology, fluid dynamics, quantitative finance -- mathematical theories are always tied to the evidence. If a theory hasn’t been tested, it’s treated as pure conjecture....Not so in econ. Traditionally, economists have put the facts in a subordinate role and theory in the driver’s seat. "
This alone might seem controversial to some, but to many economists, causality is a theory driven phenomenon, and can never truly be determined by data. I won't expand on this any further. But the point is that often, economists, outside of a purely predictive or forecasting scenario, are interested in answering causal questions, and despite all the work since the credibility revolution in terms of quasi-experimental designs, theory still plays an important role in determine causality and the direction of effects.
#2 In economics and econometrics, there is a huge emphasis on explaining causal relationships, both theoretically and empirically, but in machine learning the emphasis is prediction, classification, and pattern recognition devoid of theory or data generating processes
"Machine learning is a broad term for a collection of statistical data analysis techniques that identify key features of the data without committing to a theory. To use an old adage, machine learning “lets the data speak.”…machine learning techniques emphasized causality less than traditional economic statistical techniques, or what's usually known as econometrics. In other words, machine learning is more about forecasting than about understanding the effects of policy."
That really gets at what I have written before, about machine learning vs classical inference. (If Noah's article is interesting to you, then I highly recommend the Leo Brieman paper I reference in that post). Its true, at first it might seem that most economists interested in causal inference might sideline machine learning methods for their lack if emphasis on identification of causal effects or a data generating process. One of the biggest differences between the econometric theory most economists have been trained in and the new field of data science is in effect familiarity and use of methods from machine learning. But if they are interested strictly in predictive modeling and forecasting, these methods might be quite appealing. (I've argued before that economists are ripe for being data scientists). As we know, the methods and approaches we take to analyzing our data differ substantially depending on whether we are trying to explain vs. predict.
But then things start to get interesting:
#3 Recent work in econometrics has narrowed the gap between machine learning and econometrics
"But Athey and Imbens have also studied how machine learning techniques can be used to isolate causal effects, which would allow economists to draw policy implications."
I have not actually drilled into the references and details around this but it is interesting. Just thinking about it a little, I recalled that not long ago I worked on a project where I used gradient boosting (a machine learning algorithm) to estimate propensity scores to estimate treatment effects associated with a web ap.
Even one of the masters of metrics and causal inference, Josh Angrist is offering a course titled "Applied Econometrics:Mostly Harmless Big Data" via the MIT open course platform. And for a long time, economist Kenneth Sanford has been following this trend of emphasis on data science and machine learning in econometrics.
Overall, I think it will be interesting to see more examples of applications of machine learning in causal inference. But, when these applications involve big data and the internet of things, economists will really have to test their knowledge of a range of other big data tools that have little to do with building models or doing calculations.
Analytics vs Causal Inference
Big Data: Don't throw the baby out with the bath water
Propensity Score Weighting: Logistic vs CART vs Boosting vs Random Forests
Got Data? Probably not like your econometrics textbook!
In God we trust, all others show me your code.
Data Science, 10% inspiration, 90% perspiration