Tuesday, May 30, 2017

Multicollinearity.....just a bad joke?

Link/Credit: https://www.pinterest.com/pin/96686723226973447/ 

"The worth of an econometrics textbook tends to be inversely related to the technical material devoted to  multicollinearity" - Williams, R. Economic Record 68, 80-1. (1992).  via Kennedy, A Guide to Econometrics (6th edition).

If you have never read Arthur S. Goldberger's treatment of multicollinearity in his well known text A Course in Econometrics you are missing some of the best reading in econometrics you will ever find. A few years ago Dave Giles gave a nice preview here: http://davegiles.blogspot.com/2011/09/micronumerosity.html

Basically, Goldberger provides a good length discussion in his textbook about 'micronumerosity,' a term he makes up to parody multicollinearity and the excessive amount of attention it is given in textbooks and resources spent by practitioners attempting to 'detect' it (see Dave Giles post). Its more entertaining than the meme I found above.

For a quick review, multicollinearity can be characterized in multivariable regression as a situation where there is correlation between explanatory variables. For instance if we are estimating:

 y = b0 + b1x1 + b2x2 + b3x3 + e

and x2 and x3 are highly correlated,  the amount of independent variation in each variable is reduced. With less information available to estimate the effects b2 and b3, these estimates become less precise and their standard errors may be larger than otherwise.

As Goldberger advises, we should not spend a lot of resources trying to apply various 'tests' for multicollinearity, but focus more on if its consequences really matter:

"Researchers should not be concerned with whether or not there really is collinearity. They may well be concerned with whether the variances of the coefficient estimates are too large-for whatever reason-to provide useful estimates of the regression coefficients" (Goldberger, 1991).

Below are some other posts I have previously written on the topic, addressing multicollinearity in the context of predictive vs inferential modeling etc.

From my discussion of multicollinearity in Linear Literalism and Fundamentalist Econometrics:

"Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-​​region of the predictors used when estimating the model."-  Statist. Sci.  Volume 25, Number 3 (2010), 289-310.

See also:

Paul Allison on Multicollinearity - when not to worry

Ridge Regression