Likelihood Based Inference with Incomplete Data (Nonignorable)

In many cases, analysis methods for missing data are based on the ignorable likelihood

Lign(θY0,X)f(Y0X,θ),

regarded as a function of the parameters θ for fixed observed data Y0 and some fully observed covariates X. The density f(Y0X,θ) is obtained by integrating out the missing data Y1 from the joint density f(YX,θ)=f(Y0,Y1X,θ). Sufficient conditions for basing inference about θ on the ignorbale likelihood are that the missingness mechanism is Missing At Random(MAR) and the parameters of the model of analysis θ and those of the missingness mechanism ψ are distinct. Here we focus our attention on the situations where the missingness mechanism is Missing Not At Random(MNAR) and valid Maximum Likelihood(ML), Bayesian and Multiple Imputation(MI) inferences generally need to be based on the full likelihood

Lfull(θ,ψY0,X,M)f(Y0,MX,θ,ψ),

regarded as a function of (θ,ψ) for fixed (Y0,M). Here, f(Y0,Mθ,ψ) is obtained by integrating out Y1 from the joint density f(Y,MX,θ,ψ). Two main approaches for formulating MNAR models can be distinguished, namely selection models(SM) and pattern mixture models(PMM).

Selection and Pattern Mixture Models

SMs factor the joint distribution of mi and yi as

f(mi,yixi,θ,ψ)=f(yixi,θ)f(mixi,yi,ψ),

where the first factor is the distribution of yi in the population while the second factor is the missingness mechanism, with θ and ψ which are assumed to be distinct. Alternatively, PMMs factor the joint distribution as

f(mi,yixi,θ,ψ)=f(yixi,mi,ξ)f(mixi),

where the first factor is the distribution of yi in the strata defined by different patterns of missingness mi while the second factor models the probabilities of the different patterns, with ξ which are assumed to be distinct (Little (1993),Little and Rubin (2019)). The distinction between the two factorisations becomes clearer when considering a specific example.

Suppose thta missing values are confined to a single variable and let yi=(yi,1,yi2) be a bivariate response outcome where yi1 is fully observed and yi2 is observed for i=1,,ncc but missing for i=ncc+1,,n. Let mi2 be the missingness indicator for yi2, then a PMM factors the denisty of Y0 and M given X as

f(y0,MX,ξ)=i=1nccf(yi1,yi2xi,mi2=0,ξ)Pr(mi2=0xi,ω)×i=ncc+1nf(yi1xi,mi2=1,ξ)Pr(mi2=1xi,ω).

This expression shows that there are no data with which to estimate directly the distribution f(yi2xi,mi2=1,ξ), because all units with mi2=1 have yi2 missing. Under MAR, this is identified using the distribution of the observed data f(yi2xi,mi2=1,ξ)=f(yi2xi,mi2=0,ξ), while under MNAR it must be identified using other assumptions. The SM formulation is

f(yi,mi2θ,ψ)=f(yi1xi,θ)f(yi2xi,yi1,θ)f(mi2xi,yi1,yi2,ψ).

Typically, the missingness mechanism f(mi2xi,yi1,yi2,ψ) is modelled using some additive probit or logit regression of mi2 on xi,yi1 and yi2. However, the coefficient of yi2 in this regression is not directly estimable from the data and hence the model cannot be fully estimated without extra assumptions.

Normal Models for MNAR data

Assume we have a complete sample (yi,xi) on a continuous variable Y and a set of fully observed covariates X, for i=1,,n. Suppose that i=1,,ncc units are observed while the remaining i=ncc+1,,n units are missing, with mi being the corresponding missingness indicator. Heckman (Heckman (1976)) proposed the following selection model to handle missingness:

yixi,θ,ψN(β0+β1xi,σ2)andmixi,yi,θ,ψBern(Φ(ψ0+ψ1xi+ψ2yi)),

where θ=(β0,β1,σ2) and Φ denotes the probit (cumulative normal) distribution function. Note that if ψ2=0, the missing data are MAR, while if ψ20 the missing data are MNAR since missingness in Y depends on the unobserved value of Y. This model can be estimated using either a two-step least squares method, ML in combination with an EM algorithm, or a Bayesian approach. The main issue is the lack of information about ψ2, which can be partly identified through the specific assumptions about the distribution of the observed data of Y. This, however, makes the implicit assumption that the assumed distribution can well described the distribution of the complete (observed and missing) data which can never be tested or checked. An alternative approach is to use a PMM factorisation and model:

yimi=m,xi,ξ,ωN(β0m+β1mxi,σ2m)andmixi,ξ,ωBern(Φ(ω0+ω1xi)),

where ξ=(β0m,β1m,σ2m,m=0,1). This model implies that the distribution of yi given xi in the population is a mixture of two normal distributions with mean

[1Φ(ω0+ω1xi)][β00+β10xi]+[Φ(ω0+ω1xi)][β01+β11xi].

The parameters (β00,β10,σ20,ω) can be estimated from the data but the parameters (β01,β11,σ21) are not estimable because yi is missing when mi=1. Under MAR, the distribution of Y given X is the same for units with Y observed and missing, such that β00=β01=β0 (as well as for β1 and σ2). Under MNAR, other assumptions are needed to esitmate the parameters indexed by m=1.

Some final considerations:

  • Both SM and PMM model the joint distribution of Y and M.

  • The SM formulation is more natural when the substantive interest concerns the relationship between Y and X in the population. However, these parameters can also be derived in PMM by averaging the patterns specific parameters over the missingness patterns.

  • The PMM factorisation is more transparent in terms of the underlying assumptions about the unidentified parameters of the model, while SM tends to impose some obscure constraints in order to identify these parameters, which are also difficult to interpret.

  • Given specific assumptions to identify all the parameters in the model, PMMs are often easier to fit than SMs. In addition, imputations of the missing values are based on the predictive distribution of Y given X and M=0.

These considerations seem to favour PMM over SM as MNAR approaches, especially when considering sensitivity analysis. Bayesian approaches can also be used to identify these models, by assigning prior distributions which can be used to identify those parameters which cannot be estimated from the data. Justifications for the choice of these priors are therefore necessary to ensure the plausibility of the assumptions assessed and the impact of these assumptions on the posterior inference.

References

Heckman, James J. 1976. “The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models.” In Annals of Economic and Social Measurement, Volume 5, Number 4, 475–92. NBER.
Little, Roderick JA. 1993. “Pattern-Mixture Models for Multivariate Incomplete Data.” Journal of the American Statistical Association 88 (421): 125–34.
Little, Roderick JA, and Donald B Rubin. 2019. Statistical Analysis with Missing Data. Vol. 793. John Wiley & Sons.

Edit this page