If your goal is simply to estimate paramter values to make causal inferences i.e. evaluate treatment effects, then most likely you will only be concerned with imputation during the training or estimation stage. Again, in this post I am concerned with predictive modeling applications vs. causal inference . I will start with a short discussion of maximum likelihood estimation.

**Standard Maximum Likelihood:**

Maximize L =
Π f(y,x

_{1},…x_{k};β)*With standard ML, the likelihood function is optimized providing the values for β which define our regression model. (like Y =*β

_{0}+ β

_{1}x

_{1 }

*+ … β*

_{k}x

_{k }

*+ e)*

As is the case in many modeling scenarios, with standard MLE, only complete cases are used to estimated the model. That is, for each 'row' or individual case, all values of 'x' and 'y' must be defined. If a single explanatory variable 'x' or the dependent variable 'y' have a missing value, then that individual/case/row is excluded from the data. This is referred to as listwise deletion. In many scenarios, this can be undesirable because for one thing, you are reducing the amount of information used to estimate you model. Paul Allison has a very informative discussion of this in a recent post at Statistical Horizons.

**Full Information Maximum Likelihood (FIML):**

Maximize L =
Π f(y,x

_{1},…x_{k};β) Π f(y,x_{3},…x_{k};β)*Full information maximum likelihood is an estimation strategy that allows for us to get parameter estimates even in the presence of missing data. The overall likelihood is the product of the likelihoods specified for all observations. If there are m observations with no missing values but n observations missing x*

_{1}and x_{2 }we account for that by specifying the overall likelihood function as the product of two terms i.e. likelihood function is specified as a product of likelihoods for both complete and incomplete cases. In the example above the second term in the product depicts a case where for individual ‘i’ there are missing values for the first 3 variables. The first term represents the likelihood for all other complete cases. The overall likelihood is then optimized providing the values for β which define our regression model.
Both ML and
FIML are methods for estimating parameters; they are not imputation procedures
per-say. As Karen Grace Martin (Analysis Factor) aptly puts it

*“This method does not impute any data, but rather uses each case's available data to compute maximum likelihood estimates.”***Predictive Modeling Applications**

So if we
have missing data, we could use FIML to obtain parameter estimates for a model,
but what if we actually want to predict outcomes ‘y’ for some new data set (i.e. we want to 'score' a new data set using the model we just estimated). By
assumption, if we are trying to predict ‘y’ we don’t have values for y in our
data set. We will attempt to take the
model or parameter estimates we got from FIML and predict Y based on the
estimated values of

*our β’s and observed x’s. But what if in the new data we have missing x’s? Can’t we just use FIML to get our model and predictions? No. First we have already derived our model via FIML using our original or**training data*. Again, FIML is a model or parameter estimation procedure. To apply FIML in our new data set would imply 2 things:
2) We have observed values for what we
are trying to predict ‘y’ which by assumption we don’t have that in a prediction or scoring scenario!

So there is no way to properly specify the likelihood to even implement FIML to estimate a new model in a new data set!

So there is no way to properly specify the likelihood to even implement FIML to estimate a new model in a new data set!

But, we don't want to estimate a new model in the first place. If we
want to make new predictions based on our original model developed using FIML, we have to utilize some type of actual imputation procedure
to derive values for missing x’s in the new 'scoring' data set.

**References:**

SAS Global Forum Paper
312-2012

Handling Missing Data by Maximum
Likelihood

Paul D. Allison,
Statistical Horizons, Haverford, PA, USA

Two Recommended Solutions
for Missing Data: Multiple Imputation and Maximum Likelihood. Karen
Gace-Martin. The Analysis Factor: http://www.theanalysisfactor.com/missing-data-two-recommended-solutions/
Accessed 8/14/14

Listwise Deletion: It's Not Evil. Paul Allison, Statistical Horizons. June 13,2014. http://www.statisticalhorizons.com/listwise-deletion-its-not-evil

Listwise Deletion: It's Not Evil. Paul Allison, Statistical Horizons. June 13,2014. http://www.statisticalhorizons.com/listwise-deletion-its-not-evil