I happened to stumble upon this great article by Paul Allison who I had the priviledge of seeing at last year's SAS Global Forum. Actually, everything I've read by him from logistic regression to survival analysis and data imputation is very thorough and cogent. Here are a few of his points that hit home with me:
When is it OK to ignore multicollinearity? Well when....
# 1. The variables with high VIFs are control variables, and the variables of interest do not have high VIFs.
"Here's an example from some of my own work: the sample consists of U.S. colleges, the dependent variable is graduation rate, and the variable of interest is an indicator (dummy) for public vs. private. Two control variables are average SAT scores and average ACT scores for entering freshmen. These two variables have a correlation above .9, which corresponds to VIFs of at least 5.26 for each of them. But the VIF for the public/private indicator is only 1.04. So there's no problem to be concerned about, and no need to delete one or the other of the two controls."
# 3. The variables with high VIFs are indicator (dummy) variables that represent a categorical variable with three or more categories.
No comments:
Post a Comment