Inverse Probability Weighting

In certain cases, it is possible to reduce biases from case deletion by the application of weights. After incomplete cases are removed, the remaining complete cases can be weighted so that their distribution more closely resembles that of the full sample with respect to auxiliary variables. Weighting methods can eliminate bias due to differential response related to the variables used to model the response probabilities, but it cannot correct for biases related to variables that are unused or unmeasured (Little and Rubin (2019)). Robins, Rotnitzky, and Zhao (1994) introduced Inverse Probability Weighting (IPW) as a weighted regression approach that require an explicit model for the missingness but relaxes some of the parametric assumptions in the data model. Their method is an extension of Generalized Estimating Equations (GEE), a popular technique for modeling marginal or populationaveraged relationships between a response variable and predictors (Zeger, Liang, and Albert (1988)).

Let \(y_i=(y_{i1},\ldots,y_{iK})\) denote a vector of variables for unit \(i\) subject to missing values with \(y_i\) being fully observed for \(i=1\ldots,n_r\) units and partially-observed for \(i=n_r+1,\ldots,n\) units. Define \(m_i=1\) if \(y_i\) is incomplete and \(m_i=0\) if complete. Let \(x_i=(x_{i1},\ldots,x_{ip})\) denote a vector of fully observed covariates and suppose the interest is in estimating the mean of the distribution of \(y_i\) given \(x_i\), having the form \(g(x_i,\beta)\), where \(g()\) is a possibly non-linear regression function indexed by a parameter \(\beta\) of dimension \(d\). Let also \(z_i=(z_{i1},\ldots,z_{iq})\) be a vector of fully observed auxiliary variables that potentially predictive of missingness but are not included in the model for \(y_i \mid x_i\). When there are no missing data, a consistent estimate of \(\beta\) is given by the solution to the following GEE, under mild regularity conditions (Liang and Zeger (1986)),

\[ \sum_{i=1}^n = D_i(x_i,\beta)(y_i-g(x_i,\beta))=0, \]

where \(D_i(x_i,\beta)\) is a suitably chosen \((d\times k)\) matrix of known functions of \(x_i\). With missing data, the equation is applied only to the complete cases (\(n_{r}\)), which yields consistent estimates provided that

\[ p(m_i=1 \mid x_i,y_i,z_i,\phi)=p(m_i=1\mid x_i,\phi), \]

that is, missingness does not depend on \(y_i\) or \(z_i\) after conditioning on \(x_i\). IPW GEE methods (Robins and Rotnitzky (1995)) replace the equation with

\[ \sum_{i=1}^{n_r} = w_i(\hat{\alpha})D_i(x_i,\beta)(y_i-g(x_i,\beta))=0, \]

where \(w_i(\hat{\alpha})=\frac{1}{p(x_i,z_i \mid \hat{\alpha})}\), with \(p(x_i,z_i \mid \hat{\alpha})\) being an estimate of the probability of being a complete unit, obtained for example via logistic regressions on \(m_i\) on \(x_i\) and \(z_i\). If the logistic regression is correctly specified, IPW GEE yields a consistent estimator of \(\beta\) provided that

\[ p(m_i=1 \mid x_i,y_i,z_i,\phi)=p(m_i=1\mid x_i,z_i\phi). \]


Suppose the full data consists of a single outcome variable \(y\) and an additional variable \(z\) and that the objective is to estimate the population outcome mean \(\mu=\text{E}[y]\). If data were fully observed for \(i=1,\ldots,n\) individuals, an obvious estimator of \(\mu\) would be the sample outcome mean

\[ \bar{y}=\frac{1}{n}\sum_{i=1}^ny_i, \]

which is equivalent to the solution to the estimating equation \(\sum_{i=1}^n(y_i-\mu)=0\). When \(y\) is partially observed (while \(Z\) is always fully observed), individuals may fall into one of two missingness patterns \(r=(r_{y},r_{z})\), namely \(r=(1,1)\) if both variables are observed or \(r=(1,0)\) if \(y\) is missing. Let \(c=1\) if \(r=(1,1)\) and \(c=0\) otherwise, so that the observed data can be summarised as \((c,cy,z)\). Assuming that missingness only depends on \(z\), that is

\[ p(c=1 \mid y,z)=p(c=1 \mid z)=\pi(z), \]

then the missing data mechanism is Missing At Random (MAR). Under these conditions, the sample mean of the complete cases \(\bar{y}_{cc}=\frac{\sum_{i=1}^nc_iy_i}{c_i}\), i.e. the solution to the equation \(\sum_{i=1}^nc_i(y_i-\mu)=0\), is not a consistent estimator of \(\mu\). To correct for this, the IPW complete case estimating equation

\[ \sum_{i=1}^n\frac{c_i}{\pi(z_i)}(y_i-\mu)=0, \]

can be used to weight the contribution of each complete case by the inverse of \(\pi(z_i)\). The solution of the equation corresponds to the IPW estimator

\[ \mu_{ipw}=\left(\sum_{i=1}^n \frac{c_i}{\pi(z_i)} \right)^{-1} \sum_{i=1}^n \frac{c_iy_i}{\pi(z_i)}, \]

which is unbiased under MAR and for \(\pi(z)>0\). In case you want to have a look at the proof of this I put here the link. In most situations \(\pi(z_i)\) is not known and must be estimated from the data, typically posing some model for \(p(c=1 \mid z, \hat{\alpha})\), indexed by some parameter \(\hat{\alpha}\), for example a logistic regression

\[ \text{logit}(\pi)=\alpha_0 + \alpha_1z. \]

Of course, if the model for \(\pi(z)\) is misspecified, \(\mu_{ipw}\) can be an inconsistent estimator. In addition, IPW methods typically used data only from the completers discarding all the partially observed values, which is clearly inefficient.


Thus, IPW estimators can correct for the bias of unweighted estimators due to the dependence of the missingness mechanism on \(z_i\) (Schafer and Graham (2002)). The basic intuition of IPW methods is that each subject’s contribution to the weighted Complete Case Analysis (CCA) is replicated \(w_i\) times in order to account once for herself and \((1-w_i)\) times for those subjects with the same responses and covariates who are missing. These models are called semiparametric because they apart from requiring the regression equation to have a specific form, they do not specify any probability distribution for the response variable (Molenberghs et al. (2014)). Older GEE methods can accommodate missing values only if they are Missing Completely At Random (MCAR), while more recent methods allow them to be MAR or even Missing Not At Random (MNAR), provided that a model for the missingness is correctly specified (Robins, Rotnitzky, and Zhao (1995),Rotnitzky, Robins, and Scharfstein (1998)).


Liang, Kung-Yee, and Scott L Zeger. 1986. “Longitudinal Data Analysis Using Generalized Linear Models.” Biometrika 73 (1): 13–22.
Little, Roderick JA, and Donald B Rubin. 2019. Statistical Analysis with Missing Data. Vol. 793. John Wiley & Sons.
Molenberghs, Geert, Garrett Fitzmaurice, Michael G Kenward, Anastasios Tsiatis, and Geert Verbeke. 2014. Handbook of Missing Data Methodology. Chapman; Hall/CRC.
Robins, James M, and Andrea Rotnitzky. 1995. “Semiparametric Efficiency in Multivariate Regression Models with Missing Data.” Journal of the American Statistical Association 90 (429): 122–29.
Robins, James M, Andrea Rotnitzky, and Lue Ping Zhao. 1994. “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.” Journal of the American Statistical Association 89 (427): 846–66.
———. 1995. “Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data.” Journal of the American Statistical Association 90 (429): 106–21.
Rotnitzky, Andrea, James M Robins, and Daniel O Scharfstein. 1998. “Semiparametric Regression for Repeated Outcomes with Nonignorable Nonresponse.” Journal of the American Statistical Association 93 (444): 1321–39.
Schafer, Joseph L, and John W Graham. 2002. “Missing Data: Our View of the State of the Art.” Psychological Methods 7 (2): 147.
Zeger, Scott L, Kung-Yee Liang, and Paul S Albert. 1988. “Models for Longitudinal Data: A Generalized Estimating Equation Approach.” Biometrics, 1049–60.