Sunday, July 29, 2018

Performance of Machine Learning Models on Time Series Data

In the past few years there has been an increased interest among economists in machine learning. For more discussion see herehere, here, here, here, here, here,  and here.  See also Mindy Mallory's recent post here.

While some folks like Susan Athey are beginning to develop the theory to understand how machine learning can contribute to causal inference, it has carved out a niche in the area of prediction. But what about times series analysis and forecasting?

That is a question taken up by authors this past March in an interesting paper (Statistical and Machine Learning forecasting methods: Concerns and ways forward). They took a good look at the performance of popular machine learning algorithms relative to traditional statistical time series approaches. The authors found that traditional approaches including exponential smoothing and econometric time series approaches out performed algorithmic approaches from machine learning across a number of model specifications, algorithms, and time series data sources.

Below are some interesting excerpts and takeaways from the paper:

When I think of time series methods, I think of things like cointegration, stationarity, autocorrelation, seasonality, auto-regressive conditional heteroskedasticity etc. (I recommend Mindy Mallory's posts on time series here)

Hearing so much about the ability of some machine learning approaches (like deep learning) to mimick feature engineering, I wondered how well algorithmic approaches would handle these issues in time series applications. The authors looked at some of the previous literature in relation to this:

"In contrast to sophisticated time series forecasting methods, where achieving stationarity in both the mean and variance is considered essential, the literature of ML is divided with some studies claiming that ML methods are capable of effectively modelling any type of data pattern and can therefore be applied to the original data [62]. Other studies however, have concluded the opposite, claiming that without appropriate preprocessing, ML methods may become unstable and yield suboptimal results [28]."

One thing about this paper, as I read it, is that it does not take an adversarial or luddite tone toward machine learning methods in favor of more traditional approaches. While they found challenges related to predictive accuracy, they seemed to proactively look deeper to understand why ML algorithms performed the way they did and how to make ML approaches better at time series.

One of the challenges with ML, even with crossvalidation was overfitting and confusion of signals, patterns, and noise in the data:

"An additional concern could be the extent of randomness in the series and the ability of ML models to distinguish the patterns from the noise of the data, avoiding over-fitting....A possible reason for the improved accuracy of the ARIMA models is that their parameterization is done through the minimization of the AIC criterion, which avoids over-fitting by considering both goodness of fit and model complexity."

They also recommend instances where ML methods may offer advantages:

"even though M3 might be representative of the reality when it comes to business applications, the findings may be different if nonlinear components are present, or if the data is being dominated by other factors. In such cases, the highly flexible ML methods could offer significant advantage over statistical ones"

It was interesting that basic exponential smoothing approaches outperformed much more complicated ML methods:

"the only thing exponential smoothing methods do is smoothen the most recent errors exponentially and then extrapolate the latest pattern in order to forecast. Given their ability to learn, ML methods should do better than simple benchmarks, like exponential smoothing."

However the authors note it is often the case that smoothing methods can offer advantages over more complex econometric time series as well (i.e. ARIMA, VAR, GARCH etc.)

Toward the end of the paper the authors go on to discuss in detail the differences in the domains where we have seen a lot of success in machine learning (speech and image recognition, games, self driving cars etc. ) vs. time series and forecasting applications.

In table 10 of the paper, they drill into some of these specific differences and discuss structural instabilities related to time series data, how the 'rules' change and how forecasts themselves can influence future values, and how this kind of noise might be hard for ML algorithms to capture.

This paper is definitely worth going through again and one to keep in mind if you are about to embark on an applied forecasting project.


Makridakis S, Spiliotis E, Assimakopoulos V (2018) Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLoS ONE 13(3): e0194889.

See also Paul Cuckoo's LinkedIn post on this paper: 

Sunday, July 15, 2018

The Credibility Revolution(s) in Econometrics and Epidemiology

I've written before about the credibility revolution in economics. It also seems that in parallel with econometrics, epidemiology has its own revolution to speak of. In The Deconstruction of Paradoxes in Epidemiology, Miquel Porta writes:

"If a “revolution” in our field or area of knowledge was ongoing, would we feel it and recognize it? And if so, how?...The “revolution” is partly founded on complex mathematics, and concepts as “counterfactuals,” as well as on attractive “causal diagrams” like Directed Acyclic Graphs (DAGs). Causal diagrams are a simple way to encode our subject-matter knowledge, and our assumptions, about the qualitative causal structure of a problem. Causal diagrams also encode information about potential associations between the variables in the causal network. DAGs must be drawn following rules much more strict than the informal, heuristic graphs that we all use intuitively. Amazingly, but not surprisingly, the new approaches provide insights that are beyond most methods in current use......The possible existence of a “revolution” might also be assessed in recent and new terms as collider, M-bias, causal diagram, backdoor (biasing path), instrumental variable, negative controls, inverse probability weighting, identifiability, transportability, positivity, ignorability, collapsibility, exchangeable, g-estimation, marginal structural models, risk set, immortal time bias, Mendelian randomization, nonmonotonic, counterfactual outcome, potential outcome, sample space, or false discovery rate."

There is a lot said there. Most economists find themselves at home in relation to discussions involving most of this including anything related to potential outcomes and counterfactuals and the methods like those mentioned in the last paragraph. However, what might seem to make the revolution in epidemiology different from econometrics (at least for some applied economists) is the emphasis on directed acyclic graphs (DAGs).

Over at the Causal Analysis in Theory and Practice blog in a post titled "are economists smarter than epidemiologists (comments on imbens' recent paper)" they discuss comments by Guido Imbens from a statistical science paper (worth a read)

"In observational studies in social science, both these assumptions tend to be controversial. In this relatively simple setting, I do not see the causal graphs as adding much to either the understanding of the problem, or to the analyses."

The blog post is quite critical of this stance:

"Can economists do in their heads what epidemiologists observe in their graphs? Can they, for instance, identify the testable implications of their own assumptions? Can they decide whether the IV assumptions (i.e., exogeneity and exclusion) are satisfied in their own models of reality? Of course the can’t; such decisions are intractable to the graph-less mind....Or, are problems in economics different from those in epidemiology? I have examined the structure of typical problems in the two fields, the number of variables involved, the types of data available, and the nature of the research questions. The problems are strikingly similar."

Being trained in both biostatistics and econometrics, I encountered the credibility revolution and causal analysis mostly through seminars and talks on applied econometrics.  As economist Jayson Lusk puts it:

"if you attend a research seminar in virtually any economics department these days, you're almost certain to hear questions like, "what is your identification strategy?" or "how did you deal with endogeneity or selection?"  In short, the question is: how do we know the effects you're reporting are causal effects and not just correlations."

The first applications I encountered utilizing DAGs were either from economist Marc Bellemare with regard to one of his papers related to lagged explanatory variables, or it was a from a Statistics in Medicine paper authored by Davey Smith et al featuring Mendelian randomization.

See also:

How is it that SEMs subsume potential outcomes? 
Mediators and moderators