Available Case Analysis

Apr 27, 2016 rubric

Complete case analysis (CCA) can be particularly inefficient for data sets with a large number of variables which are partially observed. An alternative approach that can be used to conduct univariate analyses in known as Available Case Analysis (ACA), which uses all the available cases, separately for each variable under examination, to estimate the quantities of interest.

The main drawback of ACA is that the sample used to perform the analysis varies from variable to variable according to the patterns of missing data, which generates problems of comparability across variables if the missingness mechanism is not missing completely at random (MCAR), i.e. the missing data probabilities depend on the variables under study. While estimates of means and variances can be easily computed, measures of covariation need to be adjusted. In particular, for estimating sample covariances, this approach is known as pairwise deletion or pairwise inclusion

Pairwise measures of covariation

One possible approach to estimate pairwise measures of covariation for $y_{j}$ and $y_{k}$ is to use only those units $i = 1, \dots, n_{a c}$ for which both variables are observed (Little and Rubin (2019)). For example, one can compute pairwise sample covariances as:

$s_{j k}^{a c} = \frac{\sum_{i \in I_{a c}} (y_{i j} - {\bar{y}}_{j}^{a c}) (y_{i k} - {\bar{y}}_{k}^{a c})}{(n_{a c} - 1)},$

where $I_{a c}$ is the set of $n_{a c}$ with both $y_{j}$ and $y_{k}$ observed, while the sample means ${\bar{y}}_{j}^{a c}$ and ${\bar{y}}_{k}^{a c}$ are calculated over this set of units. We can also estimate the sample correlation

$r_{j k}^{⋆} = \frac{s_{j k}^{a c}}{\sqrt{s_{j}^{2} s_{k}^{2}}},$

where $s_{j}^{2}$ and $s_{k}^{2}$ are the sample variances computed over the sets of observed units $I_{j}$ and $I_{k}$ , respectively. A problem of this type of correlation estimate is that it can lie outside the range $(- 1, 1)$ , which is typically addressed by computing pairwise correlations (Wilks (1932)), where variances are estimated from the set of units with both variables observed $I_{j k}$ , i.e.

$r_{j k}^{a c} = \frac{s_{j k}^{a c}}{\sqrt{s_{j}^{2, a c} s_{k}^{2, a c}}} .$

In addition, we could also replace the sample means ${\bar{y}}_{j}^{a c}$ and ${\bar{y}}_{k}^{a c}$ , evaluated on the common set of units $I_{j k}$ , with ${\bar{y}}_{j}$ and ${\bar{y}}_{k}$ , which are evaluated on the sets of units $I_{j}$ and $I_{k}$ , respectively. This leads to the following estimates for the sample covariances (Matthai (1951)):

$s_{j k}^{⋆} = \frac{\sum_{i \in I_{a c}} (y_{i j} - {\bar{y}}_{j}) (y_{i k} - {\bar{y}}_{k})}{(n_{a c} - 1)},$

Pairwise AC estimates aim at recovering information from partially-observed units that are lost by CCA. However, when considered together, the estimates suffer from inconsistencies that undermine the validity of these methods. For example, pairwise correlation matrices may be not positive definite. Because parameters are estimated from different sets of units, different approaches can be used to obtain estimate of the measures of uncertainty (Schafer and Graham (2002)).

Conclusions

AC estimates allow to make use of all the available evidence in the data and may be more efficient that CCA when the missingness mechanism is MCAR and correlations are modest (Kim and Curry (1977)). However, when correlations are more substantial, ACA may become even less efficient than CCA (Haitovsky (1968), Azen and Van Guilder (1981)).

References

Azen, S, and M Van Guilder. 1981. “Conclusions Regarding Algorithms for Handling Incomplete Data.” 1981 Proceedings of the Statistical Computing Section, 53–56.

Haitovsky, Yoel. 1968. “Missing Data in Regression Analysis.” Journal of the Royal Statistical Society: Series B (Methodological) 30 (1): 67–82.

Kim, Jae-On, and James Curry. 1977. “The Treatment of Missing Data in Multivariate Analysis.” Sociological Methods & Research 6 (2): 215–40.

Little, Roderick JA, and Donald B Rubin. 2019. Statistical Analysis with Missing Data. Vol. 793. John Wiley & Sons.

Matthai, Abraham. 1951. “Estimation of Parameters from Incomplete Data with Application to Design of Sample Surveys.” Sankhyā: The Indian Journal of Statistics, 145–52.

Schafer, Joseph L, and John W Graham. 2002. “Missing Data: Our View of the State of the Art.” Psychological Methods 7 (2): 147.

Wilks, Samuel S. 1932. “Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples.” The Annals of Mathematical Statistics 3 (3): 163–95.

Edit this page

Delete Case Methods Available Case Analysis Listwise Deletion Complete Case Analysis