Comparing Two Populations - STAN

Feb 1, 2020 R, one sample t-test, STAN

This tutorial will focus on the use of Bayesian estimation to explore differences between two populations. BUGS (Bayesian inference Using Gibbs Sampling) is an algorithm and supporting language (resembling R) dedicated to performing the Gibbs sampling implementation of Markov Chain Monte Carlo (MCMC) method. Dialects of the BUGS language are implemented within three main projects:

OpenBUGS - written in component pascal.
JAGS - (Just Another Gibbs Sampler) - written in C++.
STAN - a dedicated Bayesian modelling framework written in C++ and implementing Hamiltonian MCMC samplers.

Whilst the above programs can be used stand-alone, they do offer the rich data pre-processing and graphical capabilities of R, and thus, they are best accessed from within R itself. As such there are multiple packages devoted to interfacing with the various software implementations:

R2OpenBUGS - interfaces with OpenBUGS
R2jags - interfaces with JAGS
rstan - interfaces with STAN

The BUGS/JAGS/STAN languages and algorithms are very powerful and flexible. However, the cost of this power and flexibility is complexity and the need for a firm understanding of the model you wish to fit as well as the priors to be used. The algorithms requires the following inputs.

Within the model:
1. The likelihood function relating the response to the predictors.
2. The definition of the priors.
Chain properties:
1. The number of chains.
2. The length of chains (number of iterations).
3. The burn-in length (number of initial iterations to ignore).
4. The thinning rate (number of iterations to count on before storing a sample).
The initial estimates to start an MCMC chain. If there are multiple chains, these starting values can differ between chains.
The list of model parameters and derivatives to monitor (and return the posterior distributions of)

This tutorial will demonstrate how to fit models in STAN (Gelman, Lee, and Guo (2015)) using the package rstan (Stan Development Team (2018)) as interface, which also requires to load some other packages.

Data generation

We will start by generating a random data set. Note, I am creating two versions of the predictor variable (a numeric version and a factorial version).

> set.seed(123)
> nA <- 60  #sample size from Population A
> nB <- 40  #sample size from Population B
> muA <- 105  #population mean of Population A
> muB <- 77.5  #population mean of Population B
> sigma <- 3  #standard deviation of both populations (equally varied)
> yA <- rnorm(nA, muA, sigma)  #Population A sample
> yB <- rnorm(nB, muB, sigma)  #Population B sample
> y <- c(yA, yB)
> x <- factor(rep(c("A", "B"), c(nA, nB)))  #categorical listing of the populations
> xn <- as.numeric(x)  #numerical version of the population category for means parameterization. # Should not start at 0.
> data <- data.frame(y, x, xn)  # dataset

Let inspect the first few rows of the dataset using the command head

> head(data)
         y x xn
1 103.3186 A  1
2 104.3095 A  1
3 109.6761 A  1
4 105.2115 A  1
5 105.3879 A  1
6 110.1452 A  1

We can also perform some exploratory data analysis - in this case, a boxplot of the response for each level of the predictor.

> boxplot(y ~ x, data)

The One Sample t-test

A t-test is essentially just a simple regression model in which the categorical predictor is represented by a binary variable in which one level is coded as \(0\) and the other \(1\). For the model itself, the observed response \(y_i\) are assumed to be drawn from a normal distribution with a given mean \(\mu\) and standard deviation \(\sigma\). The expected values are themselves determined by the linear predictor \(\mu_i=\beta_0+\beta_1x_i\), where \(\beta_0\) represents the mean of the first treatment group and \(\beta_1\) represents the difference between the mean of the first group and the mean of the second group (the effect of interest).

MCMC sampling requires priors on all parameters. We will employ weakly informative priors. Specifying “uninformative” priors is always a bit of a balancing act. If the priors are too vague (wide) the MCMC sampler can wander off into nonscence areas of likelihood rather than concentrate around areas of highest likelihood (desired when wanting the outcomes to be largely driven by the data). On the other hand, if the priors are too strong, they may have an influence on the parameters. In such a simple model, this balance is very forgiving - it is for more complex models that prior choice becomes more important. For this simple model, we will go with zero-centered Gaussian (normal) priors with relatively large standard deviations (\(1000\)) for both the intercept and the treatment effect and a wide half-cauchy (scale=\(25\)) for the standard deviation (Gelman and others (2006)).

\[y_i \sim \text{Normal}(\mu_i, \sigma), \]

where \(\mu_i=\beta_0+\beta_1x_i\).

Priors are defined as:

\[ \beta_j \sim \text{Normal}(0,1000), \;\;\; \text{and} \;\;\; \sigma \sim \text{Cauchy}(0,25), \]

for \(j=0,1\).

Fitting the model in STAN

Broadly, there are two ways of parameterising (expressing the unknown (to be estimated) components of a model) a model. Either we can estimate the means of each group (Means parameterisation) or we can estimate the mean of one group and the difference between this group and the other group(s) (Effects parameterisation). The latter is commonly used for frequentist null hypothesis testing as its parameters are more consistent with the null hypothesis of interest (that the difference between the two groups equals zero).

Effects parameterisation

\[ y_i = \beta_0 + \beta_{j}x_i + \epsilon_i, \;\;\; \text{with} \;\;\; \epsilon_i \sim \text{Normal}(0,\sigma). \]

Each \(y_i\) is modelled by an intercept \(\beta_0\) (mean of group A) plus a difference parameter \(\beta_j\) (difference between mean of group A and group B) multiplied by an indicator of which group the observation came from (\(x_i\)), plus a residual drawn from a normal distribution with mean \(0\) and standard deviation \(\sigma\). Actually, there are as many \(\beta_j\) parameters as there are groups but one of them (typically the first) is set to be equal to zero (to avoid over-parameterization). Expected values of \(y\) are modelled assuming they are drawn from a normal distribution whose mean is determined by a linear combination of effect parameters and whose variance is defined by the degree of variability in this mean. The parameters are: \(\beta_0\), \(\beta_1\) and \(\sigma\).

Means parameterisation

\[ y_i = \beta_{j} + \epsilon_i, \;\;\; \text{with} \;\;\; \epsilon_i \sim \text{Normal}(0,\sigma). \]

Each \(y_i\) is modelled as the mean \(\beta_j\) of each group (\(j=1,2\)) plus a residual drawn from a normal distribution with a mean of zero and a standard deviation of \(\sigma\). Actually, \(\boldsymbol \beta\) is a set of \(j\) coefficients corresponding to the \(j\) dummy coded factor levels. Expected values of \(y\) are modelled assuming they are drawn from a normal distribution whose mean is determined by a linear combination of means parameters and whose variance is defined by the degree of variability in this mean. The parameters are: \(\beta_1\), \(\beta_2\) and \(\sigma\).

Whilst the STAN language broadly resembles BUGS/JAGS, there are numerous important differences. Some of these differences are to support translation to c++ for compilation (such as declaring variables). Others reflect leveraging of vectorization to speed up run time. Here are some important notes about STAN:

All variables must be declared
Variables declared in the parameters block will be collected
Anything in the transformed block will be collected as samples. Also, checks will be made every loop

Now I will demonstrate fitting the models with STAN. Note, I am using the refresh=0 option so as to suppress the larger regular output in the interest of keeping output to what is necessary for this tutorial. When running outside of a tutorial context, the regular verbose output is useful as it provides a way to gauge progress.

Effects Parameterisation

> stanString = " 
+ data {
+  int n;
+  vector [n] y;
+  vector [n] x;
+  }
+  parameters {
+  real <lower=0, upper=100> sigma;
+  real beta0;
+  real beta;
+  }
+  transformed parameters {
+  }
+  model {
+  vector [n] mu;
+  
+  //Priors
+  beta0 ~ normal(0,1000);
+  beta ~ normal(0,1000);
+  sigma ~ cauchy(0,25);
+  
+  mu = beta0 + beta*x;
+  //Likelihood
+  y ~ normal(mu, sigma);
+  }
+  generated quantities {
+  vector [2] Group_means;
+  real CohensD;
+  //Other Derived parameters 
+  //# Group means (note, beta is a vector)
+  Group_means[1] = beta0;
+  Group_means[2] = beta0+beta;
+  
+  CohensD = beta /sigma;  
+  }
+  
+  "
> ## write the model to a text file
> writeLines(stanString, con = "ttestModel.stan")

Means Parameterisation

> stanString.means = "  
+  data {
+  int n;
+  int nX;
+  vector [n] y;
+  matrix [n,nX] x;
+  }
+  parameters {
+  real <lower=0, upper=100> sigma;
+  vector [nX] beta;
+  }
+  transformed parameters {
+  }
+  model {
+  vector [n] mu;
+  
+  //Priors
+  beta ~ normal(0,1000);
+  sigma ~ cauchy(0,25);
+  
+  mu = x*beta;
+  //Likelihood
+  y ~ normal(mu, sigma);
+  }
+  generated quantities {
+  vector [2] Group_means;
+  real CohensD;
+  
+  //Other Derived parameters 
+  Group_means[1] = beta[1];
+  Group_means[2] = beta[1]+beta[2];
+  
+  CohensD = beta[2] /sigma;  
+  }
+  
+  "
> ## write the model to a text file
> writeLines(stanString.means, con = "ttestModelMeans.stan")

Arrange the data as a list (as required by STAN).

> data.list <- with(data, list(y = y, x = (xn - 1), n = nrow(data)))
> X <- model.matrix(~x, data)
> data.list.means = with(data, list(y = y, x = X, n = nrow(data), nX = ncol(X)))

Define the initial values for the chain. Reasonable starting points can be gleaned from the data themselves.

> inits <- list(beta0 = mean(data$y), beta = c(NA, diff(tapply(data$y,
+     data$x, mean))), sigma = sd(data$y/2))
> inits.means <- list(beta = tapply(data$y, data$x, mean), sigma = sd(data$y/2))

Define the nodes (parameters and derivatives) to monitor.

> params <- c("beta0", "beta", "sigma", "Group_means", "CohensD")
> params.means <- c("beta", "sigma", "Group_means","CohensD")

Define the chain parameters.

> burnInSteps = 500  # the number of initial samples to discard
> nChains = 2  # the number of independed sampling chains to perform 
> thinSteps = 1  # the thinning rate
> nIter = 2000

Start the STAN model (check the model, load data into the model, specify the number of chains and compile the model). Load the rstan package.

> library(rstan)

When using the stan function (rtsan package), it is not necessary to provide initial values. However, if they are to be supplied, the inital values must be provided as a list of the same length as the number of chains.

Effects Parameterisation

> data.stan = stan(file = "ttestModel.stan", 
+   data = data.list, 
+   pars = params,
+   iter = nIter,
+   warmup = burnInSteps, 
+   chains = nChains, 
+   thin = thinSteps, 
+   init = "random", #or inits=list(inits,inits)
+   refresh = 0)
> 
> #print results
> print(data.stan)
Inference for Stan model: ttestModel.
2 chains, each with iter=2000; warmup=500; thin=1; 
post-warmup draws per chain=1500, total post-warmup draws=3000.

                  mean se_mean   sd    2.5%     25%     50%     75%   97.5%
beta0           105.20    0.01 0.36  104.47  104.95  105.20  105.44  105.91
beta            -27.32    0.01 0.57  -28.43  -27.72  -27.33  -26.93  -26.22
sigma             2.79    0.00 0.21    2.41    2.64    2.77    2.92    3.23
Group_means[1]  105.20    0.01 0.36  104.47  104.95  105.20  105.44  105.91
Group_means[2]   77.88    0.01 0.45   77.01   77.59   77.87   78.18   78.76
CohensD          -9.85    0.02 0.75  -11.36  -10.35   -9.86   -9.35   -8.36
lp__           -150.78    0.04 1.25 -154.05 -151.31 -150.44 -149.88 -149.34
               n_eff Rhat
beta0           1802    1
beta            1731    1
sigma           2187    1
Group_means[1]  1802    1
Group_means[2]  2826    1
CohensD         2238    1
lp__            1272    1

Samples were drawn using NUTS(diag_e) at Mon Feb 10 14:10:29 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Means Parameterisation

> data.stan.means = stan(file = "ttestModelMeans.stan", 
+   data = data.list.means, 
+   pars = params.means,
+   iter = nIter,
+   warmup = burnInSteps, 
+   chains = nChains, 
+   thin = thinSteps, 
+   init = "random", #or inits=list(inits.means,inits.means)
+   refresh = 0)
> 
> #print results
> print(data.stan.means)
Inference for Stan model: ttestModelMeans.
2 chains, each with iter=2000; warmup=500; thin=1; 
post-warmup draws per chain=1500, total post-warmup draws=3000.

                  mean se_mean   sd    2.5%     25%     50%     75%   97.5%
beta[1]         105.21    0.01 0.37  104.51  104.96  105.20  105.44  105.92
beta[2]         -27.33    0.01 0.58  -28.47  -27.71  -27.31  -26.93  -26.23
sigma             2.78    0.00 0.20    2.43    2.64    2.76    2.90    3.22
Group_means[1]  105.21    0.01 0.37  104.51  104.96  105.20  105.44  105.92
Group_means[2]   77.88    0.01 0.44   77.02   77.59   77.88   78.17   78.77
CohensD          -9.88    0.02 0.74  -11.35  -10.40   -9.89   -9.40   -8.40
lp__           -150.74    0.03 1.26 -153.85 -151.33 -150.42 -149.83 -149.33
               n_eff Rhat
beta[1]         1439    1
beta[2]         1654    1
sigma           1955    1
Group_means[1]  1439    1
Group_means[2]  3595    1
CohensD         2056    1
lp__            1397    1

Samples were drawn using NUTS(diag_e) at Mon Feb 10 14:11:08 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Notes

If inits="random" the stan function will randomly generate initial values between \(-2\) and \(2\) on the unconstrained support. The optional additional parameter init_r can be set to some value other than \(2\) to change the range of the randomly generated inits. Other available options include: set inits="0" to initialize all parameters to zero on the unconstrained support; set inital values by providing a list equal in length to the number of chains; set initial values by providing a function that returns a list for specifying the initial values of parameters for a chain.
In addition to the mean and quantiles of each of the sample nodes, the stan function will calculate.
1. The effective sample size for each sample - if n.eff for a node is substantially less than the number of iterations, then it suggests poor mixing.
2. The Potential scale reduction factor or Rhat values for each sample - these are a convergence diagnostic (values of \(1\) indicate full convergence, values greater than \(1.01\) are indicative of non-convergence.

The total number samples collected is \(3000\). That is, there are \(3000\) samples collected from the multidimensional posterior distribution and thus, \(3000\) samples collected from the posterior distributions of each parameter. The effective number of samples column indicates the number of independent samples represented in the total. It is clear that for all parameters the chains were well mixed.

MCMC diagnostics

Again, prior to examining the summaries, we should have explored the convergence diagnostics. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarisation).

extract the mcmc samples and convert them into a mcmc.list to leverage the various mcmcplots routines
use the numerous routines that come with the rstan package
use the routines that come with the bayesplot package

We will explore all of these.

mcmcplots

First, we need to convert the rtsan object into an mcmc.list object to apply the functions in the mcmcplots package.

> library(mcmcplots)
> s = as.array(data.stan.means)
> mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc))

Next we look at density and trace plots.

> denplot(mcmc, parms = c("Group_means", "CohensD"))

> traplot(mcmc, parms = c("Group_means", "CohensD"))

These plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.

rstan

MCMC diagnostic measures that can be directly applied to rstan objects via the rstan package include: traceplots, autocorrelation, effective sample size and Rhat diagnostics.

> #traceplots
> stan_trace(data.stan.means, pars = c("Group_means", "CohensD"))

> 
> #autocorrelation
> stan_ac(data.stan.means, pars = c("Group_means", "CohensD"))

> 
> #rhat
> stan_rhat(data.stan.means, pars = c("Group_means", "CohensD"))

> 
> #ess
> stan_ess(data.stan.means, pars = c("Group_means", "CohensD"))

Note:

Rhat values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than \(1.05\). If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or overly vague priors that permit sampling in otherwise nonscence parameter space.
ESS indicates the number samples (or proportion of samples that the sampling algorithm) deamed effective. The sampler rejects samples on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain may contain \(1000\) samples, if there are only \(10\) effective samples (\(1\)%), the estimated properties are not likely to be reliable.
bayesplot

Another alternative is to use the package bayesplot, which provides a range of standardised diagnostic measures for assessing MCMC convergence and issues, which can be directly applied to the rstan object.

> library(bayesplot)
> 
> #density and trace plots
> mcmc_combo(as.array(data.stan.means), regex_pars = "Group_means|CohensD")

Model validation

Residuals are not computed directly within rstan. However, we can calculate them manually form the posteriors.

> library(ggplot2)
> mcmc = as.matrix(data.stan.means)[, c("beta[1]", "beta[2]")]
> # generate a model matrix
> newdata = data.frame(x = data$x)
> Xmat = model.matrix(~x, newdata)
> ## get median parameter estimates
> coefs = apply(mcmc, 2, median)
> fit = as.vector(coefs %*% t(Xmat))
> resid = data$y - fit
> ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))

There is no evidence that the mcmc chain did not converge on a stable posterior distribution. We are now in a position to examine the summaries of the parameters.

Parameter estimates

A quick look at posterior summaries can be obtained through the command summary which can be directly applied to our rstan object.

> summary(data.stan.means)
$summary
                      mean     se_mean        sd        2.5%        25%
beta[1]         105.205981 0.009650332 0.3660680  104.512893  104.95847
beta[2]         -27.327670 0.014175541 0.5765494  -28.471465  -27.70858
sigma             2.779295 0.004564259 0.2017951    2.425126    2.63877
Group_means[1]  105.205981 0.009650332 0.3660680  104.512893  104.95847
Group_means[2]   77.878310 0.007293288 0.4372737   77.017179   77.58974
CohensD          -9.883908 0.016318680 0.7398548  -11.345145  -10.39596
lp__           -150.744310 0.033765022 1.2622380 -153.847632 -151.32845
                       50%         75%       97.5%    n_eff      Rhat
beta[1]         105.197887  105.442341  105.923970 1438.928 1.0006369
beta[2]         -27.313058  -26.929462  -26.228003 1654.222 0.9996207
sigma             2.761057    2.904130    3.220382 1954.702 1.0008448
Group_means[1]  105.197887  105.442341  105.923970 1438.928 1.0006369
Group_means[2]   77.881198   78.173471   78.765424 3594.677 0.9997923
CohensD          -9.893648   -9.396558   -8.403284 2055.526 1.0013095
lp__           -150.420841 -149.826519 -149.327836 1397.489 1.0006469

$c_summary
, , chains = chain:1

                stats
parameter               mean        sd        2.5%         25%         50%
  beta[1]         105.194598 0.3722763  104.485138  104.943830  105.189222
  beta[2]         -27.316749 0.5909082  -28.503315  -27.700926  -27.303076
  sigma             2.787113 0.2017944    2.439039    2.649487    2.769964
  Group_means[1]  105.194598 0.3722763  104.485138  104.943830  105.189222
  Group_means[2]   77.877849 0.4452879   76.953676   77.589838   77.884335
  CohensD          -9.851471 0.7306980  -11.291742  -10.351622   -9.856804
  lp__           -150.774304 1.3031195 -154.143552 -151.358639 -150.446011
                stats
parameter                75%       97.5%
  beta[1]         105.430335  105.928130
  beta[2]         -26.900706  -26.189346
  sigma             2.905763    3.220038
  Group_means[1]  105.430335  105.928130
  Group_means[2]   78.167639   78.777570
  CohensD          -9.358039   -8.394201
  lp__           -149.844014 -149.328052

, , chains = chain:2

                stats
parameter               mean        sd        2.5%         25%         50%
  beta[1]         105.217363 0.3595164  104.544008  104.970466  105.208509
  beta[2]         -27.338592 0.5618086  -28.444722  -27.716894  -27.323423
  sigma             2.771476 0.2015598    2.417028    2.631247    2.750654
  Group_means[1]  105.217363 0.3595164  104.544008  104.970466  105.208509
  Group_means[2]   77.878771 0.4292579   77.031912   77.589743   77.878030
  CohensD          -9.916344 0.7477366  -11.431004  -10.435551   -9.924630
  lp__           -150.714316 1.2196850 -153.673281 -151.305580 -150.383196
                stats
parameter                75%       97.5%
  beta[1]         105.450568  105.916257
  beta[2]         -26.963106  -26.265061
  sigma             2.898644    3.219905
  Group_means[1]  105.450568  105.916257
  Group_means[2]   78.179664   78.753253
  CohensD          -9.430001   -8.422613
  lp__           -149.795340 -149.327597

The Group A is typically \(27.3\) units greater than Group B. The \(95\)% confidence interval for the difference between Group A and B does not overlap with \(0\) implying a significant difference between the two groups.

Graphical summaries

A nice graphic is often a great accompaniment to a statistical analysis. Although there are no fixed assumptions associated with graphing (in contrast to statistical analyses), we often want the graphical summaries to reflect the associated statistical analyses. After all, the sample is just one perspective on the population(s). What we are more interested in is being able to estimate and depict likely population parameters/trends. Thus, whilst we could easily provide a plot displaying the raw data along with simple measures of location and spread, arguably, we should use estimates that reflect the fitted model. In this case, it would be appropriate to plot the credibility interval associated with each group. We do this by loading functions in the package broom and dplyr.

> library(broom)
> library(dplyr)
> mcmc = as.matrix(data.stan.means)
> ## Calculate the fitted values
> newdata = data.frame(x = levels(data$x))
> Xmat = model.matrix(~x, newdata)
> coefs = mcmc[, c("beta[1]", "beta[2]")]
> fit = coefs %*% t(Xmat)
> newdata = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
> newdata
  x  estimate std.error  conf.low conf.high
1 A 105.20598 0.3660680 104.52503 105.93588
2 B  77.87831 0.4372737  76.99792  78.74455
> 
> ggplot(newdata, aes(y = estimate, x = x)) + geom_pointrange(aes(ymin = conf.low,
+     ymax = conf.high)) + scale_y_continuous("Y") + scale_x_discrete("X") +
+     theme_classic()

If you wanted to represent sample data on the figure in such a simple example (single predictor) we could simply over- (or under-) lay the raw data.

> ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = data, aes(y = y,
+     x = x), color = "gray") + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
+     scale_y_continuous("Y") + scale_x_discrete("X") + theme_classic()

A more general solution would be to add the partial residuals to the figure. Partial residuals are the fitted values plus the residuals. In this simple case, that equates to exactly the same as the raw observations since \(\text{resid}=\text{obs}−\text{fitted}\) and the fitted values depend only on the single predictor we are interested in.

> ## Calculate partial residuals fitted values
> fdata = rdata = data
> fMat = rMat = model.matrix(~x, fdata)
> fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
> resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
> rdata = rdata %>% mutate(partial.resid = resid + fit)
> ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid),
+     color = "gray") + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
+     scale_y_continuous("Y") + scale_x_discrete("X") + theme_classic()

Effect sizes

We can compute summaries for our effect size of interest (e.g. Cohen’s or the percentage ES) by post-processing our posterior distributions.

> mcmc = as.matrix(data.stan.means)
> ## Cohen's D
> cohenD = mcmc[, "beta[2]"]/mcmc[, "sigma"]
> tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval")
# A tibble: 1 x 5
  term  estimate std.error conf.low conf.high
  <chr>    <dbl>     <dbl>    <dbl>     <dbl>
1 var1     -9.88     0.740    -11.3     -8.38
> 
> # Percentage change (relative to Group A)
> ES = 100 * mcmc[, "beta[2]"]/mcmc[, "beta[1]"]
> 
> # Probability that the effect is greater than 10% (a decline of >10%)
> sum(-1 * ES > 10)/length(ES)
[1] 1

Probability statements

Any sort of probability statements of interest about our effect size can be computed in a relatively easy way by playing around with the posteriors.

> mcmc = as.matrix(data.stan.means)
> 
> # Percentage change (relative to Group A)
> ES = 100 * mcmc[, "beta[2]"]/mcmc[, "beta[1]"]
> hist(ES)

> 
> # Probability that the effect is greater than 10% (a decline of >10%)
> sum(-1 * ES > 10)/length(ES)
[1] 1
> 
> # Probability that the effect is greater than 25% (a decline of >25%)
> sum(-1 * ES > 25)/length(ES)
[1] 0.978

Finite population standard deviations

Estimates for the variability associated with between and within group differences can also be easily obtained.

# A tibble: 2 x 5
  term     estimate std.error conf.low conf.high
  <chr>       <dbl>     <dbl>    <dbl>     <dbl>
1 sd.x        19.3     0.408     18.5      20.1 
2 sd.resid     2.75    0.0207     2.74      2.79
# A tibble: 2 x 5
  term     estimate std.error conf.low conf.high
  <chr>       <dbl>     <dbl>    <dbl>     <dbl>
1 sd.x         87.5     0.238     87.1      87.8
2 sd.resid     12.5     0.238     12.2      12.9

Unequally varied populations

We can also generate data assuming two populations with different variances, e.g. between male and female subgroups.

> set.seed(123)
> n1 <- 60  #sample size from population 1
> n2 <- 40  #sample size from population 2
> mu1 <- 105  #population mean of population 1
> mu2 <- 77.5  #population mean of population 2
> sigma1 <- 3  #standard deviation of population 1
> sigma2 <- 2  #standard deviation of population 2
> n <- n1 + n2  #total sample size
> y1 <- rnorm(n1, mu1, sigma1)  #population 1 sample
> y2 <- rnorm(n2, mu2, sigma2)  #population 2 sample
> y <- c(y1, y2)
> x <- factor(rep(c("A", "B"), c(n1, n2)))  #categorical listing of the populations
> xn <- rep(c(0, 1), c(n1, n2))  #numerical version of the population category
> data2 <- data.frame(y, x, xn)  # dataset
> head(data2)  #print out the first six rows of the data set
         y x xn
1 103.3186 A  0
2 104.3095 A  0
3 109.6761 A  0
4 105.2115 A  0
5 105.3879 A  0
6 110.1452 A  0

Start by defining the model

\[ y_i = \beta_0 + \beta_1x_i + \epsilon, \]

where \(\epsilon_1 \sim \text{Normal}(0,\sigma_1)\) for \(x_1=0\) (females), and \(\epsilon_2 \sim \text{Normal}(0,\sigma_2)\) for \(x_2=1\) (males). In STAN code, the model becomes:

> stanStringv3 = " 
+  data {
+  int n;
+  vector [n] y;
+  vector [n] x;
+  int<lower=1,upper=2> xn[n];
+  }
+  parameters {
+  vector <lower=0, upper=100>[2] sigma;
+  real beta0;
+  real beta;
+  }
+  transformed parameters {
+  }
+  model {
+  vector [n] mu;
+  //Priors
+  beta0 ~ normal(0,1000);
+  beta ~ normal(0,1000);
+  sigma ~ cauchy(0,25);
+ 
+  mu = beta0 + beta*x;
+  //Likelihood
+  for (i in 1:n) y[i] ~ normal(mu[i], sigma[xn[i]]);
+  }
+  generated quantities {
+  vector [2] Group_means;
+  real CohensD;
+  real CLES;
+ 
+  Group_means[1] = beta0;
+  Group_means[2] = beta0+beta;
+  CohensD = beta /(sum(sigma)/2);
+  CLES = normal_cdf(beta /sum(sigma),0,1);  
+  }
+  
+  "
> 
> ## write the model to a text file 
> writeLines(stanStringv3,con="ttestModelv3.stan")

We specify priors directly on \(\sigma_1\) and \(\sigma_2\) using Cauchy distributions with a scale of \(25\). Next, arrange the data as a list (as required by STAN) and define the MCMC parameters.

> data2.list <- with(data, list(y = y, x = (xn - 1), xn = xn, n = nrow(data)))
> paramsv3 <- c("beta0","beta","sigma","Group_means","CohensD", "CLES")
> burnInSteps = 500
> nChains = 2
> thinSteps = 1
> nIter = 2000

Finally, fit the model in STAN and print the results.

> data.stanv3 = stan(file = "ttestModelv3.stan", 
+   data = data2.list, 
+   pars = paramsv3,
+   iter = nIter,
+   warmup = burnInSteps, 
+   chains = nChains, 
+   thin = thinSteps, 
+   init = "random", #or inits=list(inits,inits)
+   refresh = 0)
> 
> #print results
> print(data.stanv3)
Inference for Stan model: ttestModelv3.
2 chains, each with iter=2000; warmup=500; thin=1; 
post-warmup draws per chain=1500, total post-warmup draws=3000.

                  mean se_mean   sd    2.5%     25%     50%     75%   97.5%
beta0           105.21    0.01 0.36  104.51  104.97  105.21  105.44  105.92
beta            -27.34    0.01 0.57  -28.45  -27.71  -27.35  -26.96  -26.21
sigma[1]          2.79    0.01 0.27    2.31    2.60    2.77    2.97    3.38
sigma[2]          2.88    0.01 0.34    2.31    2.63    2.84    3.07    3.65
Group_means[1]  105.21    0.01 0.36  104.51  104.97  105.21  105.44  105.92
Group_means[2]   77.86    0.01 0.44   77.00   77.57   77.86   78.15   78.75
CohensD          -9.70    0.02 0.76  -11.23  -10.23   -9.69   -9.17   -8.26
CLES              0.00    0.00 0.00    0.00    0.00    0.00    0.00    0.00
lp__           -150.30    0.04 1.42 -153.88 -151.02 -149.99 -149.25 -148.53
               n_eff Rhat
beta0           2426    1
beta            2359    1
sigma[1]        2166    1
sigma[2]        2547    1
Group_means[1]  2426    1
Group_means[2]  3478    1
CohensD         2468    1
CLES            1875    1
lp__            1277    1

Samples were drawn using NUTS(diag_e) at Mon Feb 10 14:11:55 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

References

Gelman, Andrew, Daniel Lee, and Jiqiang Guo. 2015. “Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization.” Journal of Educational and Behavioral Statistics 40 (5): 530–43.

Gelman, Andrew, and others. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34.

Stan Development Team. 2018. “RStan: The R Interface to Stan.” http://mc-stan.org/.

tutorials STAN population differences