Pattern Mixture Models

It is possible to summarise the steps involved in drawing inference from incomplete data as (Daniels and Hogan (2008)):

  • Specification of a full data model for the response and missingness indicators f(y,r)

  • Specification of the prior distribution (within a Bayesian approach)

  • Sampling from the posterior distribution of full data parameters, given the observed data Yobs and the missingness indicators R

Identification of a full data model, particularly the part involving the missing data Ymis, requires making unverifiable assumptions about the full data model f(y,r). Under the assumption of the ignorability of the missingness mechanism, the model can be identified using only the information from the observed data. When ignorability is not believed to be a suitable assumption, one can use a more general class of models that allows missing data indicators to depend on missing responses themselves. These models allow to parameterise the conditional dependence between R and Ymis, given Yobs. Without the benefit of untestable assumptions, this association structure cannot be identified from the observed data and therefore inference depends on some combination of two elements:

  1. Unverifiable parametric assumptions

  2. Informative prior distributions (under a Bayesian approach)

We show some simple examples about how these nonignorable models can be constructed, identified and applied. In this section, we specifically focus on the class of nonignorable models known as Pattern Mixture Models(PMM).

Pattern Mixture Models

The pattern mixture model approach factors the full data distribution as

f(y,rω)=f(yr,ϕ)f(ry,χ),

where it is typically assumed that the set of full data parameters ω can be decomposed as separate parameters for each factor (ϕ,χ). Thus, under the PMM approach, the response model f(yθ) can be retrieved as a mixture of the pattern specific distributions

f(yθ)=rf(yr,ϕ)f(rχ),

with weights given by the corresponding probabilities of the different patterns. The missingness mechanism f(ry,ψ) can also be obtained using Bayes’ rule

f(yr,ψ)=f(yr,ϕ)f(rχ)f(yθ).

The construction of PMMs requires the specification of the full data distribution conditional on different missingness patterns, which may be cumbersome when the number of patterns is large, but with the advantage of making explicit the parameters that cannot be identified by the observed data. In particular, PMMs are well suited to show that the distribution of the response within each pattern can be decomposed as

f(yobs,ymisr,ϕ)=f(ymisyobs,r,ϕE)f(yobsr,ϕO),

where ϕE=λ1(ϕ) and ϕO=λ2(ϕ) are functions of the mixture component parameter ϕ. The former subset of parameters indexes the so called extrapolation distribution and cannot be identified from the data, i.e. the distribution of the missing values given the observed values, while the latter indexes the observed data distribution and is typically identifiable from the data. Assuming there exists a partition such that ϕE=(ϕEI,ϕENI) and the observed data distribution is a function of ϕEI but not of ϕENI, then ϕENI is a senstivity parameter in that it can only be identified using information from sources other than the observed data and thus makes a suitable basis to formulate sensitivity analysis using informative priors.

Example of PMM for bivariate normal data

Consider a sample of i=1,,n units from a bivariate normal distribution Y=(Y1,Y2). Assume also that Y1 is always observed while Y2 may be missing, and let R=R2 be the missingness indicator for the partially-observed response Y2. A PMM factors the full data distribution as

f(y1,y2,rω)=f(y1,y2r,ϕ)f(r,χ),

where, for example, we may have YR=1N(μ1,Σ1), YR=0N(μ0,Σ0) and RBern(χ). We define μr=(μ1r), while Σr has elements σr=(σ11r,σ12r,σ22r). Similarly, we can define the parameters β0r, β1r and σ21r as the intercept, slope and residual variance of the regression of Y2 on Y1 for each pattern r. Under this reparameterisation, the full data model parameters are

ϕ={μ1r,σ11r,β0r,β11,σ21r}.

The extrapolation and observed data distributions, with associated parameters, are then

f(ymisyobs,ϕE)ϕE=(β00,β10,σ210)

and

f(yobsϕO)ϕO=(μ1,β1,σ111,μ00,σ111).

It can be shown that, in this specific example, the observed data distribution does not depend on the parameters indexing the extrapolation distribtuon ϕENI=(β00,β10,σ210). It is possible to set β0=β=1 and σ210=σ211 to yield a Missing At Random(MAR) assumption. Hence, a function that maps identified parameters and sensitivity parameters Δ to the space of unidentified parameters can be used to quantify departures from MAR. For example, assume we impose

β00=β01+Δ,

then assigning a point mass prior at Δ=0 implies MAR, while fixing Δ0 or using any type of inofrmative prior on this parameter implies a Missing Not At Random(MNAR) assumption.

Conlcusions

To summarise, PMMs have the advantage of being able to find full data parameters indexing the distribution of the missing data that are not identified from the observed data, making inference more transparent. A potential downside is the practical implementation of these models which becomes more difficult as the number of patterns and unidentified parameters grows.

References

Daniels, Michael J, and Joseph W Hogan. 2008. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman; Hall/CRC.

Edit this page