Some Simplifications for the Expectation-Maximization (EM) Algorithm: The Linear Regression Model Case
By: Daniel A. Griffith
Potential Business Impact:
Fills in missing data to make predictions.
The EM algorithm is a generic tool that offers maximum likelihood solutions when datasets are incomplete with data values missing at random or completely at random. At least for its simplest form, the algorithm can be rewritten in terms of an ANCOVA regression specification. This formulation allows several analytical results to be derived that permit the EM algorithm solution to be expressed in terms of new observation predictions and their variances. Implementations can be made with a linear regression or a nonlinear regression model routine, allowing missing value imputations, even when they must satisfy constraints. Fourteen example datasets gleaned from the EM algorithm literature are reanalyzed. Imputation results have been verified with SAS PROC MI. Six theorems are proved that broadly contextualize imputation findings in terms of the theory, methodology, and practice of statistical science.
Similar Papers
EM Approaches to Nonparametric Estimation for Mixture of Linear Regressions
Methodology
Finds hidden groups in data.
Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression
Machine Learning (CS)
Helps computer models learn from messy data faster.
Maximum Likelihood for Logistic Regression Model with Incomplete and Hybrid-Type Covariates
Methodology
Fixes computer math when some numbers are missing.