If your goal is simply to estimate paramter values to make causal inferences i.e. evaluate treatment effects, then most likely you will only be concerned with imputation during the training or estimation stage. Again, in this post I am concerned with predictive modeling applications vs. causal inference . I will start with a short discussion of maximum likelihood estimation.
Standard Maximum Likelihood:
Maximize L =
Π f(y,x1,…xk;β)
With standard ML, the likelihood
function is optimized providing the values for β which define our regression
model. (like Y = β0
+ β1 x1 + … βk xk +
e)
As is the case in many modeling scenarios, with standard MLE, only complete cases are used to estimated the model. That is, for each 'row' or individual case, all values of 'x' and 'y' must be defined. If a single explanatory variable 'x' or the dependent variable 'y' have a missing value, then that individual/case/row is excluded from the data. This is referred to as listwise deletion. In many scenarios, this can be undesirable because for one thing, you are reducing the amount of information used to estimate you model. Paul Allison has a very informative discussion of this in a recent post at Statistical Horizons.
As is the case in many modeling scenarios, with standard MLE, only complete cases are used to estimated the model. That is, for each 'row' or individual case, all values of 'x' and 'y' must be defined. If a single explanatory variable 'x' or the dependent variable 'y' have a missing value, then that individual/case/row is excluded from the data. This is referred to as listwise deletion. In many scenarios, this can be undesirable because for one thing, you are reducing the amount of information used to estimate you model. Paul Allison has a very informative discussion of this in a recent post at Statistical Horizons.
Full Information Maximum Likelihood
(FIML):
Maximize L =
Π f(y,x1,…xk;β) Π
f(y,x3,…xk;β)
Full information maximum likelihood
is an estimation strategy that allows for us to get parameter estimates even in
the presence of missing data. The overall
likelihood is the product of the likelihoods specified for all observations. If
there are m observations with no missing values but n observations missing x1
and x2 we account for that by specifying the overall likelihood
function as the product of two terms i.e. likelihood function is specified as a
product of likelihoods for both complete and incomplete cases. In the example above the second term in the
product depicts a case where for individual ‘i’ there are missing values for
the first 3 variables. The first term represents the likelihood for all other
complete cases. The overall likelihood is then optimized providing the values
for β which define our regression model.
Both ML and
FIML are methods for estimating parameters; they are not imputation procedures
per-say. As Karen Grace Martin (Analysis Factor) aptly puts it “This
method does not impute any data, but rather uses each case's available data to
compute maximum likelihood estimates.”
Predictive Modeling Applications
So if we
have missing data, we could use FIML to obtain parameter estimates for a model,
but what if we actually want to predict outcomes ‘y’ for some new data set (i.e. we want to 'score' a new data set using the model we just estimated). By
assumption, if we are trying to predict ‘y’ we don’t have values for y in our
data set. We will attempt to take the
model or parameter estimates we got from FIML and predict Y based on the
estimated values of our β’s and observed x’s. But what if in
the new data we have missing x’s? Can’t we just use FIML to get our model and
predictions? No. First we have already derived our model via FIML using our
original or training data. Again,
FIML is a model or parameter estimation procedure. To apply FIML in our new
data set would imply 2 things:
2) We have observed values for what we
are trying to predict ‘y’ which by assumption we don’t have that in a prediction or scoring scenario!
So there is no way to properly specify the likelihood to even implement FIML to estimate a new model in a new data set!
So there is no way to properly specify the likelihood to even implement FIML to estimate a new model in a new data set!
But, we don't want to estimate a new model in the first place. If we
want to make new predictions based on our original model developed using FIML, we have to utilize some type of actual imputation procedure
to derive values for missing x’s in the new 'scoring' data set.
References:
SAS Global Forum Paper
312-2012
Handling Missing Data by Maximum
Likelihood
Paul D. Allison,
Statistical Horizons, Haverford, PA, USA
Two Recommended Solutions
for Missing Data: Multiple Imputation and Maximum Likelihood. Karen
Gace-Martin. The Analysis Factor: http://www.theanalysisfactor.com/missing-data-two-recommended-solutions/
Accessed 8/14/14
Listwise Deletion: It's Not Evil. Paul Allison, Statistical Horizons. June 13,2014. http://www.statisticalhorizons.com/listwise-deletion-its-not-evil
Listwise Deletion: It's Not Evil. Paul Allison, Statistical Horizons. June 13,2014. http://www.statisticalhorizons.com/listwise-deletion-its-not-evil
No comments:
Post a Comment