In many cases, analysis methods for missing data are based on the ignorable likelihood
regarded as a function of the parameters for fixed observed data and some fully observed covariates . The density is obtained by integrating out the missing data from the joint density . Sufficient conditions for basing inference about on the ignorbale likelihood are that the missingness mechanism is Missing At Random(MAR) and the parameters of the model of analysis and those of the missingness mechanism are distinct. Here we focus our attention on the situations where the missingness mechanism is Missing Not At Random(MNAR) and valid Maximum Likelihood(ML), Bayesian and Multiple Imputation(MI) inferences generally need to be based on the full likelihood
regarded as a function of for fixed . Here, is obtained by integrating out from the joint density . Two main approaches for formulating MNAR models can be distinguished, namely selection models(SM) and pattern mixture models(PMM).
Selection and Pattern Mixture Models
SMs factor the joint distribution of and as
where the first factor is the distribution of in the population while the second factor is the missingness mechanism, with and which are assumed to be distinct. Alternatively, PMMs factor the joint distribution as
where the first factor is the distribution of in the strata defined by different patterns of missingness while the second factor models the probabilities of the different patterns, with which are assumed to be distinct (Little (1993),Little and Rubin (2019)). The distinction between the two factorisations becomes clearer when considering a specific example.
Suppose thta missing values are confined to a single variable and let be a bivariate response outcome where is fully observed and is observed for but missing for . Let be the missingness indicator for , then a PMM factors the denisty of and given as
This expression shows that there are no data with which to estimate directly the distribution , because all units with have missing. Under MAR, this is identified using the distribution of the observed data , while under MNAR it must be identified using other assumptions. The SM formulation is
Typically, the missingness mechanism is modelled using some additive probit or logit regression of on , and . However, the coefficient of in this regression is not directly estimable from the data and hence the model cannot be fully estimated without extra assumptions.
Normal Models for MNAR data
Assume we have a complete sample on a continuous variable and a set of fully observed covariates , for . Suppose that units are observed while the remaining units are missing, with being the corresponding missingness indicator. Heckman (Heckman (1976)) proposed the following selection model to handle missingness:
where and denotes the probit (cumulative normal) distribution function. Note that if , the missing data are MAR, while if the missing data are MNAR since missingness in depends on the unobserved value of . This model can be estimated using either a two-step least squares method, ML in combination with an EM algorithm, or a Bayesian approach. The main issue is the lack of information about , which can be partly identified through the specific assumptions about the distribution of the observed data of . This, however, makes the implicit assumption that the assumed distribution can well described the distribution of the complete (observed and missing) data which can never be tested or checked. An alternative approach is to use a PMM factorisation and model:
where . This model implies that the distribution of given in the population is a mixture of two normal distributions with mean
The parameters can be estimated from the data but the parameters are not estimable because is missing when . Under MAR, the distribution of given is the same for units with observed and missing, such that (as well as for and ). Under MNAR, other assumptions are needed to esitmate the parameters indexed by .
Some final considerations:
Both SM and PMM model the joint distribution of and .
The SM formulation is more natural when the substantive interest concerns the relationship between and in the population. However, these parameters can also be derived in PMM by averaging the patterns specific parameters over the missingness patterns.
The PMM factorisation is more transparent in terms of the underlying assumptions about the unidentified parameters of the model, while SM tends to impose some obscure constraints in order to identify these parameters, which are also difficult to interpret.
Given specific assumptions to identify all the parameters in the model, PMMs are often easier to fit than SMs. In addition, imputations of the missing values are based on the predictive distribution of given and .
These considerations seem to favour PMM over SM as MNAR approaches, especially when considering sensitivity analysis. Bayesian approaches can also be used to identify these models, by assigning prior distributions which can be used to identify those parameters which cannot be estimated from the data. Justifications for the choice of these priors are therefore necessary to ensure the plausibility of the assumptions assessed and the impact of these assumptions on the posterior inference.
References
Heckman, James J. 1976. “The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models.” In Annals of Economic and Social Measurement, Volume 5, Number 4, 475–92. NBER.
Little, Roderick JA. 1993. “Pattern-Mixture Models for Multivariate Incomplete Data.”Journal of the American Statistical Association 88 (421): 125–34.
Little, Roderick JA, and Donald B Rubin. 2019. Statistical Analysis with Missing Data. Vol. 793. John Wiley & Sons.