A general problem associated with the implementatio of Inverse Probability Weighting (IPW) methods is that information in some available data is ignored by focussing only on the complete cases (Schafer and Graham (2002)). This has provided room to extend these methods to make a more efficient use of the available information through the incorporation of an “augmentation” term, which lead to the development of the so called Augmented Inverse Probability Weighting (AIPW) methods. These approaches extend IPW methods by creating predictions from a model to recover the information in the incomplete units and applying IPW to the residuals from the model (Little and Rubin (2019)).
Considering the IPW Generalised Estimating Equation (GEE)
where , with an estimate of the probability of being a complete unit estimated for example using logistic regressions of the missingness indicator on the vectors of the covariate and auxiliary variables and , respectively. A problem of this IPW estimator is that it has poor small sample properties when the propensity score gets close to zero or one for some observations, which will lead to high variance in the estimator. AIPW methods can provide estimators of which are more efficient than their nonaugmented IPW versions. In general, AIPW estimating functions provide a method for constructing estimators of based on two terms:
The usual IPW term
An augmentation term
The basis for the first term is a complete data unbiased estimating function for , whereas the basis for the second term is some function of the observed data chosen so it has conditional mean of zero given the complete data (Molenberghs et al. (2014)).
Doubly Robust Estimators
An important class of AIPW methods is known as doubly robust estimators, which have desirable robustness properties (Robins, Rotnitzky, and Laan (2000),Robins and Rotnitzky (2001)). The key feature of these estimators is that they relax the assumption that the model of the missingness probabilities is correctly specified, although requiring additional assumptions on the model for . For example, doubly robust estimators for a population mean parameter could be obtained as follows:
Fit a logistic regression model for the probability of observing as a function of and to derive the individual weights .
Fit a generalized linear model for the outcome of responders in function of using weights and let denote the fitted values for subject .
Take the sample average of the fitted values of both respondents and nonrespondents as an estimate of the population mean
Doubly robust estimators require the specification of two models: one for the missingness probability and another for the distribution of the incomplete data. When the augmentation term is selected and modelled correctly according to the distribution of the complete data, the resulting estimator of is consistent even if the model of missingness is misspecified. On the other hand, if the model of missingness is correctly specified, the augmentation term no longer needs to be correctly specified to yield consistent estimators of (Scharfstein, Daniels, and Robins (2003),Bang and Robins (2005)). Doubly robust estimators therefore allow to obtain an unbiased estimating function for if either the model for the incomplete data or the model for the missingness mechanism has been correctly specified.
Example
Suppose the full data consists of a single outcome variable and an additional variable and that the objective is to estimate the population outcome mean . When is partially observed (while is always fully observed), individuals may fall into one of two missingness patterns , namely if both variables are observed or if is missing. Let if and otherwise, so that the observed data can be summarised as . Assuming that missingness only depends on , that is
then the missing data mechanism is Missing At Random (MAR). Under these conditions, consider the consistent IPW complete case estimating equation
which can be used to weight the contribution of each complete case by the inverse of , typically estimated via logistic regressions. A general problem of this type of estimators is that they discard all the available data among the non-completers and are therefore inefficient. However, it is possible to augment the simple IPW complete case estimating equation to improve efficiency. The optimal estimator for within this class is the solution to the estimating equation
which leads to the estimator
The conditional expectation is not known and must be estimated from the data. Under a Missing At Random (MAR) assumption we have that , that is the conditional expecation of given is the same as that among the completers. Thus, we can specify a model for , indexed by the parameter , that can be estimated from the completers. If is continuous, a simple choice is to estimate by OLS from the completers. The AIPW estimator for then becomes
It can be shown that this estimator is more efficient that the simple IPW complete case estimator for and that it has a double robustness property. This ensures that is a consitent estimator of if either
the model is correctly specified, or
the model is correctly specified.
To see a derivation of the double robustness property I put here a link to some nice paper.
Conlcusions
As all weighting methods, such as IPW, AIPW methods are semiparametric methods that aim to achieve robustness and good performance over more general classes of population distributions. However, semiparametric estimators can be less efficient and less powerful than Maximum Likelihood or Bayesian estimators under a well specified parametric model. With missing data, Rubin (1976) results show that likelihood-based methods perform uniformly well over any Missing At Random (MAR) missingness distribution, and the user does not need to specify that distribution. However, semiparametric methods that relax assumptions about the data must in turn assume a specific form for the distribution of missingness. It has been argued that, for these semiparametric methods to gain a substantial advantage over well-specified likelihood methods, the parametric model has to be grossly misspecified (Meng (2000)).
References
Bang, Heejung, and James M Robins. 2005. “Doubly Robust Estimation in Missing Data and Causal Inference Models.”Biometrics 61 (4): 962–73.
Little, Roderick JA, and Donald B Rubin. 2019. Statistical Analysis with Missing Data. Vol. 793. John Wiley & Sons.
Meng, Xiao-Li. 2000. “Missing Data: Dial m For???”Journal of the American Statistical Association 95 (452): 1325–30.
Molenberghs, Geert, Garrett Fitzmaurice, Michael G Kenward, Anastasios Tsiatis, and Geert Verbeke. 2014. Handbook of Missing Data Methodology. Chapman; Hall/CRC.
Robins, James M, and Andrea Rotnitzky. 2001. “Comment on the Bickel and Kwon Article,‘inference for Semiparametric Models: Some Questions and an Answer’.”Statistica Sinica 11 (4): 920–36.
Robins, James M, Andrea Rotnitzky, and Mark van der Laan. 2000. “On Profile Likelihood: Comment.”Journal of the American Statistical Association 95 (450): 477–82.
Rubin, Donald B. 1976. “Inference and Missing Data.”Biometrika 63 (3): 581–92.
Schafer, Joseph L, and John W Graham. 2002. “Missing Data: Our View of the State of the Art.”Psychological Methods 7 (2): 147.
Scharfstein, Daniel O, Michael J Daniels, and James M Robins. 2003. “Incorporating Prior Beliefs about Selection Bias into the Analysis of Randomized Trials with Missing Outcomes.”Biostatistics 4 (4): 495–512.