A possible approach to analyse missing data is to use methods based on the likelihood function under specific modelling assumptions. In this section, I review maximum likelihood methods based on fully observed data alone.
Maximum Likelihood Methods for Complete Data
Let denote the set of data, which are assumed to be generated according to a certain probability density function indexed by the set of parameters , which lies on the parameter space (i.e. set of values of for which is a proper density function). The Likelihood function, indicated with , is defined as any function of proportional that is to . Note that, in contrast to the density function which is defined as a function of the data given the values of the parameters , instead the likelihood is defined as a function of the parameters for fixed data . In addition, the loglikelihood function, indicated with is defined as the natural logarithm of .
Univariate Normal Example
The joint density function of independent and identially distributed units from a Normal distribution with mean and variance , is
and therefore the loglikelihood is
which is considered as a function of for fixed data .
Multivariate Normal Example
If the sample considered has dimension , e.g. we have a set of idependent and identically distributed variables , for units and variables, which comes from a Multivariate Normal distribution with mean vector and covariance matrix for and , then its density function is
where denotes the determinant of the matrix and the superscript denotes the transpose of a matrix or vector, while denotes the row vector of observed values for unit . The loglikelihood of is
MLE estimation
Finding the maximum value of that is most likely to have generated the data , corresponding to maximising the likelihood or Maximum Likelihood Estimation(MLE), is a standard approach to make inference about . Suppose a specific value for the parameter such that for any other value of . This implies that the observed data is at least as likely under as under any other value of , i.e. is the value best supported by the data. More specifically, a maximum likelihood estimate of is a value of that maximises the likelihood or, equivalently, that maximises the loglikelihood . In general, when the likelihood is differentiable and bounded from above, typically the MLE can be found by differentiating or with respect to , setting the result equal to zero, and solving for . The resulting equation, , is known as the likelihood equation and the derivative of the loglikelihood as the score function. When consists in a set of components, then the likelihood equation corresponds to a set of simultaneous equations, obtained by differentiating with respect to each component of .
Univariate Normal Example
Recall that, for a Normal sample with units, the loglikelihood is indexed by the set of parameters and has the form
Next, the MLE can be derived by first differentiating with respect to and set the result equal to zero, that is
Next, after simplifying a bit, we can retrieve the solution
which corresponds to the sample mean of the observations. Next, we differentiate with respect to , that is we set
We then simplify and move things around to get
Finally, we replace in the expression above with the value found before and obtain the solution
which, however, is a biased estimator of and therefore is often replaced with the unbiased estimator . In particular, given a population parameter , the estimator for is said to be unbiased when . This is the case, for example, of the sample mean which is an unbiased estimator of the population mean :
However, this is not true for the sample variance . This can be seen by first rewriting the expression of the estimator as
and then by computing the expectation of this quantity:
The above result is obtained by pluggin in the expression for the variance of a general variable and retrieving the expression for as a function of the variance and . More specifically, given that
then we know that for , , and similarly we can derive the same expression for . However, we can see that is biased by a factor of . Thus, an unbiased estimator for is given by multiplying by , which gives the unbiased estimator , where .
Multivariate Normal Example
The same procedure can be applied to an independent and identically distributed multivariate sample , for units and variables (Anderson (1962),Rao et al. (1973),Gelman et al. (2013)). It can be shown that, maximising the loglikelihood with respect to and yields the MLEs
where is the row vectors of sample means and is the sample covariance matrix with -th element . In addition, in general, given a function of the parameter , if is a MLE of , then is a MLE of .
Conditional Distribution of a Bivariate Normal
Consider an indpendent and identically distributed sample formed by two variables , each measured on units, which come from a Bivariate Normal distribution with mean vector and covariance matrix
where is a correlation parameter between the two variables. Thus, intuitive MLEs for these parameters are
where , , for . By properites of the Bivariate Normal distribution (Ord and Stuart (1994)), the marginal distribution of and the conditional distribution of are
where is the parameter that quantifies the linear dependence between the two variables. The MLEs of and can also be derived from the likelihood based on the conditional distribution of , which have strong connections with the least squares estimates derived in a multiple linear regression framework.
Multiple Linear Regression
Suppose the data consist in units measured on an outcome variable and a set of covariates and assume that the distribution of given is Normal with mean and variance . The loglikelihood of given the observed data is given by
Maximising this expression with respect to , the MLEs are found to be equal to the least squares estimates of the intercept and regression coefficients. Using a matrix notation for the -th vector of the outcome values and the matrix of the covariate values (including the constant term), then the MLEs are:
where the numerator of the fraction is known as the Residual Sum of Squares(RSS). Because the denominator of is equal to , the MLE of does not correct for the loss of degrees of freedom when estimating the location parameters. Thus, the MLE should instead divide the RSS by to obtain an unbiased estimator. An extension of standard multiple linear regression is the so called weighted multiple linear regression, in which the regression variance is assumed to be equal to, for . Thus, the variable is Normally distributed with mean and variance , and the loglikelihood is
Maximising this function yields MLEs given by the weighted least squares estimates
where .
Generalised Linear Models
Consider the previous example where we had an outcome variable and a set of covariates, each measured on units. A more general class of models, compare with the Normal model, assumes that, given , the values of are an independent sample from a regular exponential family distribution
where and are known functions that determine the distribution of , and is a known function indexed by a scale parameter . The mean of is assumed to linearly relate to the covariates via
where and is a known one to one function which is called link function because it “links” the expectation of to a linear combination of the covariates. The canonical link function
which is obtained by setting equal to the inverse of the derivative of with respect to its argument. Examples of canonical links include
The loglikelihood of given the observed data , is
which for non-normal cases does not have explicit maxima and numerical maximisation can be achieved using iterative algorithms.
References
Anderson, Theodore Wilbur. 1962. “An Introduction to Multivariate Statistical Analysis.” Wiley New York.
Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian Data Analysis. Chapman; Hall/CRC.
Ord, Keith, and Alan Stuart. 1994. “Kendall’s Advanced Theory of Statistics: Distribution Theory.” Edward Arnold.
Rao, Calyampudi Radhakrishna, Calyampudi Radhakrishna Rao, Mathematischer Statistiker, Calyampudi Radhakrishna Rao, and Calyampudi Radhakrishna Rao. 1973. Linear Statistical Inference and Its Applications. Vol. 2. Wiley New York.