# Nested Anova (STAN)

This tutorial will focus on the use of Bayesian estimation to fit simple linear regression models. BUGS (Bayesian inference Using Gibbs Sampling) is an algorithm and supporting language (resembling R) dedicated to performing the Gibbs sampling implementation of Markov Chain Monte Carlo (MCMC) method. Dialects of the BUGS language are implemented within three main projects:

1. OpenBUGS - written in component pascal.

2. JAGS - (Just Another Gibbs Sampler) - written in C++.

3. STAN - a dedicated Bayesian modelling framework written in C++ and implementing Hamiltonian MCMC samplers.

Whilst the above programs can be used stand-alone, they do offer the rich data pre-processing and graphical capabilities of R, and thus, they are best accessed from within R itself. As such there are multiple packages devoted to interfacing with the various software implementations:

• R2OpenBUGS - interfaces with OpenBUGS

• R2jags - interfaces with JAGS

• rstan - interfaces with STAN

This tutorial will demonstrate how to fit models in STAN (Gelman, Lee, and Guo (2015)) using the package rstan (Stan Development Team (2018)) as interface, which also requires to load some other packages.

# Overview

## Introduction

When single sampling units are selected amongst highly heterogeneous conditions, it is unlikely that these single units will adequately represent the populations and repeated sampling is likely to yield very different outcomes. For example, if we were investigating the impacts of fuel reduction burning across a highly heterogeneous landscape, our ability to replicate adequately might be limited by the number of burn sites available.

Alternatively, sub-replicates within each of the sampling units (e.g. sites) can be collected (and averaged) so as to provided better representatives for each of the units and ultimately reduce the unexplained variability of the test of treatments. In essence, the sub-replicates are the replicates of an additional nested factor whose levels are nested within the main treatment factor. A nested factor refers to a factor whose levels are unique within each level of the factor it is nested within and each level is only represented once. For example, the fuel reduction burn study design could consist of three burnt sites and three un-burnt (control) sites each containing four quadrats (replicates of site and sub-replicates of the burn treatment). Each site represents a unique level of a random factor (any given site cannot be both burnt and un-burnt) that is nested within the fire treatment (burned or not).

A nested design can be thought of as a hierarchical arrangement of factors (hence the alternative name hierarchical designs) whereby a treatment is progressively sub-replicated. As an additional example, imagine an experiment designed to comparing the leaf toughness of a number of tree species. Working down the hierarchy, five individual trees were randomly selected within (nested within) each species, three branches were randomly selected within each tree, two leaves were randomly selected within each branch and the force required to shear the leaf material in half (transversely) was measured in four random locations along the leaf. Clearly any given leaf can only be from a single branch, tree and species. Each level of sub-replication is introduced to further reduce the amount of unexplained variation and thereby increasing the power of the test for the main treatment effect. Additionally, it is possible to investigate which scale has the greatest (or least, etc) degree of variability - the level of the species, individual tree, branch, leaf, leaf region etc.

• Nested factors are typically random factors, of which the levels are randomly selected to represent all possible levels (e.g. sites). When the main treatment effect (often referred to as Factor A) is a fixed factor, such designs are referred to as a mixed model nested ANOVA, whereas when Factor A is random, the design is referred to as a Model II nested ANOVA.

• Fixed nested factors are also possible. For example, specific dates (corresponding to particular times during a season) could be nested within season. When all factors are fixed, the design is referred to as a Model I mixed model.

• Fully nested designs (the topic of this chapter) differ from other multi-factor designs in that all factors within (below) the main treatment factor are nested and thus interactions are un-replicated and cannot be tested. Indeed, interaction effects (interaction between Factor A and site) are assumed to be zero.

## Linear models (frequentist)

The linear models for two and three factor nested design are:

$y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \epsilon_{ijk},$

$y_{ijkl} = \mu + \alpha_i + \beta_{j(i)} + gamma_{k(j(i))} + \epsilon_{ijkl},$

where $$\mu$$ is the overall mean, $$\alpha$$ is the effect of Factor A, $$\beta$$ is the effect of Factor B, $$\gamma$$ is the effect of Factor C and $$\epsilon$$ is the random unexplained or residual component.

## Linear models (Bayesian)

So called “random effects” are modelled differently from “fixed effects” in that rather than estimate their individual effects, we instead estimate the variability due to these “random effects.” Since technically all variables in a Bayesian framework are random, some prefer to use the terms ‘fixed effects’ and ‘varying effects.’ As random factors typically represent “random” selections of levels (such as a set of randomly selected sites), incorporated in order to account for the dependency structure (observations within sites are more likely to be correlated to one another - not strickly independent) to the data, we are not overly interested in the individual differences between levels of the ‘varying’ (random) factor. Instead (in addition to imposing a separate correlation structure within each nest), we want to know how much variability is attributed to this level of the design. The linear models for two and three factor nested design are:

$y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \epsilon_{ijk}, \;\;\; \epsilon_{ijk} \sim N(0, \sigma^2), \;\;\; \beta_{j(i)} \sim N(0, \sigma^2_{B})$

$y_{ijkl} = \mu + \alpha_i + \beta_{j(i)} + \gamma_{k(j(i))} + \epsilon_{ijkl}, \;\;\; \epsilon_{ijkl} \sim N(0, \sigma^2), \;\;\; \beta_{j(i)} \sim N(0, \sigma^2_{B}) \;\;\; \gamma_{k(j(i))} \sim N(0, \sigma^2_C)$

where $$\mu$$ is the overall mean, $$\alpha$$ is the effect of Factor A, $$\beta$$ is the variability of Factor B (nested within Factor A), $$\gamma$$ is the variability of Factor C (nested within Factor B) and $$\epsilon$$ is the random unexplained or residual component that is assumed to be normally distributed with a mean of zero and a constant amount of standard deviation ($$\sigma^2$$). The subscripts are iterators. For example, the $$i$$ represents the number of effects to be estimated for Factor A. Thus the first formula can be read as the $$k$$-th observation of $$y$$ is drawn from a normal distribution (with a specific level of variability) and mean proposed to be determined by a base mean ($$\mu$$ - mean of the first treatment across all nests) plus the effect of the $$i$$-th treatment effect plus the variabilitythe model proposes that, given a base mean ($$\mu$$) and knowing the effect of the $$i$$-th treatment (factor A) and which of the $$j$$-th nests within the treatment the $$k$$-th observation from Block $$j$$ (factor B) within treatment effect.

## Null hypotheses

Separate null hypotheses are associated with each of the factors, however, nested factors are typically only added to absorb some of the unexplained variability and thus, specific hypotheses tests associated with nested factors are of lesser biological importance. Hence, rather than estimate the effects of random effects, we instead estimate how much variability they contribute.

Factor A: the main treatment effect (fixed)

• $$H_0(A): \mu_1=\mu_2=\ldots=\mu_i=\mu$$ (the population group means are all equal). That is, that the mean of population $$1$$ is equal to that of population $$2$$ and so on, and thus all population means are equal to one another - no effect of the factor on the response. If the effect of the $$i$$-th group is the difference between the $$i$$-th group mean and the mean of the first group ($$\alpha_i=\mu_i-\mu_1$$) then the $$H_0$$ can alternatively be written as:

• $$H_0(A) : \alpha_1=\alpha_2=\ldots=\alpha_i=0$$ (the effect of each group equals zero). If one or more of the $$\alpha_i$$ are different from zero (the response mean for this treatment differs from the overall response mean), there is evidence that the null hypothesis is not true indicating that the factor does affect the response variable.

Factor A: the main treatment effect (random)

• $$H_0(A) : \sigma^2_{\alpha}=0$$ (population variance equals zero). There is no added variance due to all possible levels of A.

Factor B: the nested effect (random)

• $$H_0(B) : \sigma^2_{\beta}=0$$ (population variance equals zero). There is no added variance due to all possible levels of B within the (set or all possible) levels of A.

Factor B: the nested effect (fixed)

• $$H_0(B): \mu_{1(1)}=\mu_{2(1)}=\ldots=\mu_{j(i)}=\mu$$ (the population group means of B (within A) are all equal).

• $$H_0(B): \beta_{1(1)}=\beta_{2(1)}=\ldots=\beta_{j(i)}=0$$ (the effect of each chosen B group equals zero).

## Analysis of variance

Analysis of variance sequentially partitions the total variability in the response variable into components explained by each of the factors (starting with the factors lowest down in the hierarchy - the most deeply nested) and the components unexplained by each factor. Explained variability is calculated by subtracting the amount unexplained by the factor from the amount unexplained by a reduced model that does not contain the factor. When the null hypothesis for a factor is true (no effect or added variability), the ratio of explained and unexplained components for that factor (F-ratio) should follow a theoretical F-distribution with an expected value less than 1. The appropriate unexplained residuals and therefore the appropriate F-ratios for each factor differ according to the different null hypotheses associated with different combinations of fixed and random factors in a nested linear model (see Table below).

> fact_anova_table
df        MS         F-ratio (B random)    Var comp (B random)
A     "a-1"     "MS A"     "(MS A)/(MS B'(A))"   "((MS A) - (MS B'(A)))/nb"
B'(A) "(b-1)a"  "MS B'(A)" "(MS B'(A))/(MS res)" "((MS B'(A)) - (MS res))/n"
Res   "(n-1)ba" "MS res"   ""                    ""
F-ratio (B fixed)     Var comp (B fixed)
A     "(MS A)/(MS res)"     "((MS A) - (MS res))/nb"
B'(A) "(MS B'(A))/(MS res)" "((MS B'(A)) - (MS res))/n"
Res   ""                    ""                         

The corresponding R syntax is given below.

> #A fixed/random, B random (balanced)
> summary(aov(y~A+Error(B), data))
> VarCorr(lme(y~A,random=1|B, data))
>
> #A fixed/random, B random (unbalanced)
> anova(lme(y~A,random=1|B, data), type='marginal')
>
> #A fixed/random, B fixed(balanced)
> summary(aov(y~A+B, data))
>
> #A fixed/random, B fixed (unbalanced)
> A3 <- ifelse(data.nest$A=='a3',1,0) > data.nest.list <- with(data.nest, list(y=y, A2=A2, A3=A3, B=as.numeric(Sites), + n=nrow(data.nest), nB=length(levels(Sites)))) Define the nodes (parameters and derivatives) to monitor and the chain parameters. > params <- c("alpha0","alpha2","alpha3","sigma","sigma_B") > burnInSteps = 3000 > nChains = 2 > numSavedSteps = 3000 > thinSteps = 1 > nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains) Now run the STAN code via the rstan interface. > data.nest.rstan.c2 <- stan(data = data.nest.list, file = "full2Model.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps) SAMPLING FOR MODEL 'full2Model' NOW (CHAIN 1). Chain 1: Chain 1: Gradient evaluation took 0 seconds Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. Chain 1: Adjust your expectations accordingly! Chain 1: Chain 1: Chain 1: Iteration: 1 / 4500 [ 0%] (Warmup) Chain 1: Iteration: 450 / 4500 [ 10%] (Warmup) Chain 1: Iteration: 900 / 4500 [ 20%] (Warmup) Chain 1: Iteration: 1350 / 4500 [ 30%] (Warmup) Chain 1: Iteration: 1800 / 4500 [ 40%] (Warmup) Chain 1: Iteration: 2250 / 4500 [ 50%] (Warmup) Chain 1: Iteration: 2700 / 4500 [ 60%] (Warmup) Chain 1: Iteration: 3001 / 4500 [ 66%] (Sampling) Chain 1: Iteration: 3450 / 4500 [ 76%] (Sampling) Chain 1: Iteration: 3900 / 4500 [ 86%] (Sampling) Chain 1: Iteration: 4350 / 4500 [ 96%] (Sampling) Chain 1: Iteration: 4500 / 4500 [100%] (Sampling) Chain 1: Chain 1: Elapsed Time: 4.735 seconds (Warm-up) Chain 1: 3.205 seconds (Sampling) Chain 1: 7.94 seconds (Total) Chain 1: SAMPLING FOR MODEL 'full2Model' NOW (CHAIN 2). Chain 2: Chain 2: Gradient evaluation took 0 seconds Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. Chain 2: Adjust your expectations accordingly! Chain 2: Chain 2: Chain 2: Iteration: 1 / 4500 [ 0%] (Warmup) Chain 2: Iteration: 450 / 4500 [ 10%] (Warmup) Chain 2: Iteration: 900 / 4500 [ 20%] (Warmup) Chain 2: Iteration: 1350 / 4500 [ 30%] (Warmup) Chain 2: Iteration: 1800 / 4500 [ 40%] (Warmup) Chain 2: Iteration: 2250 / 4500 [ 50%] (Warmup) Chain 2: Iteration: 2700 / 4500 [ 60%] (Warmup) Chain 2: Iteration: 3001 / 4500 [ 66%] (Sampling) Chain 2: Iteration: 3450 / 4500 [ 76%] (Sampling) Chain 2: Iteration: 3900 / 4500 [ 86%] (Sampling) Chain 2: Iteration: 4350 / 4500 [ 96%] (Sampling) Chain 2: Iteration: 4500 / 4500 [100%] (Sampling) Chain 2: Chain 2: Elapsed Time: 4.246 seconds (Warm-up) Chain 2: 3.033 seconds (Sampling) Chain 2: 7.279 seconds (Total) Chain 2: > > print(data.nest.rstan.c2, par = c("alpha0", "alpha2", "alpha3", "sigma", "sigma_B")) Inference for Stan model: full2Model. 2 chains, each with iter=4500; warmup=3000; thin=1; post-warmup draws per chain=1500, total post-warmup draws=3000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat alpha0 42.14 0.18 5.28 32.35 38.73 41.95 45.37 53.07 821 1 alpha2 27.75 0.23 7.18 13.76 23.20 27.86 32.36 42.00 977 1 alpha3 40.82 0.26 7.41 26.03 36.15 40.72 45.54 55.23 784 1 sigma 5.04 0.01 0.30 4.49 4.82 5.03 5.24 5.68 1365 1 sigma_B 11.61 0.10 2.87 7.58 9.70 11.15 13.02 18.04 885 1 Samples were drawn using NUTS(diag_e) at Fri Dec 11 09:11:46 2020. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). > > data.nest.rstan.c2.df <-as.data.frame(extract(data.nest.rstan.c2)) > head(data.nest.rstan.c2.df) alpha0 alpha2 alpha3 sigma sigma_B lp__ 1 39.70299 33.61554 38.67780 5.984734 8.194675 -361.6994 2 47.68870 17.00806 25.96021 5.075493 10.263042 -354.7169 3 41.26990 29.39069 46.25732 5.113701 8.065909 -359.5336 4 38.02844 38.40668 49.88790 5.127820 13.155391 -355.3177 5 40.43790 18.58691 46.44096 4.763335 12.664118 -355.2254 6 46.51676 26.35108 36.54911 4.474970 11.370100 -360.1367 ## Matrix parameterisation > rstanString2=" + data{ + int n; + int nB; + int nA; + vector [n] y; + matrix [n,nA] X; + int B[n]; + vector [nA] a0; + matrix [nA,nA] A0; + } + + parameters{ + vector [nA] alpha; + real<lower=0> sigma; + vector [nB] beta; + real<lower=0> sigma_B; + } + + model{ + real mu[n]; + + // Priors + //alpha ~ normal( 0 , 100 ); + alpha ~ multi_normal(a0,A0); + beta ~ normal( 0 , sigma_B ); + sigma_B ~ cauchy( 0 , 25); + sigma ~ cauchy( 0 , 25 ); + + for ( i in 1:n ) { + mu[i] = dot_product(X[i],alpha) + beta[B[i]]; + } + y ~ normal( mu , sigma ); + } + + " > > ## write the model to a text file > writeLines(rstanString2, con = "matrixModel.stan") Arrange the data as a list (as required by STAN). As input, STAN will need to be supplied with: the response variable, the predictor matrix, the number of predictors, the total number of observed items. This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses. > X <- model.matrix(~A, data.nest) > nA <- ncol(X) > data.nest.list <- with(data.nest, list(y=y, X=X, B=as.numeric(Sites), + n=nrow(data.nest), nB=length(levels(Sites)), nA=nA, + a0=rep(0,nA), A0=diag(100000,nA))) Define the nodes (parameters and derivatives) to monitor and the chain parameters. > params <- c("alpha","sigma","sigma_B") > burnInSteps = 3000 > nChains = 2 > numSavedSteps = 3000 > thinSteps = 1 > nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains) Now run the STAN code via the rstan interface. > data.nest.rstan.m <- stan(data = data.nest.list, file = "matrixModel.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps) SAMPLING FOR MODEL 'matrixModel' NOW (CHAIN 1). Chain 1: Chain 1: Gradient evaluation took 0 seconds Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. Chain 1: Adjust your expectations accordingly! Chain 1: Chain 1: Chain 1: Iteration: 1 / 4500 [ 0%] (Warmup) Chain 1: Iteration: 450 / 4500 [ 10%] (Warmup) Chain 1: Iteration: 900 / 4500 [ 20%] (Warmup) Chain 1: Iteration: 1350 / 4500 [ 30%] (Warmup) Chain 1: Iteration: 1800 / 4500 [ 40%] (Warmup) Chain 1: Iteration: 2250 / 4500 [ 50%] (Warmup) Chain 1: Iteration: 2700 / 4500 [ 60%] (Warmup) Chain 1: Iteration: 3001 / 4500 [ 66%] (Sampling) Chain 1: Iteration: 3450 / 4500 [ 76%] (Sampling) Chain 1: Iteration: 3900 / 4500 [ 86%] (Sampling) Chain 1: Iteration: 4350 / 4500 [ 96%] (Sampling) Chain 1: Iteration: 4500 / 4500 [100%] (Sampling) Chain 1: Chain 1: Elapsed Time: 8.255 seconds (Warm-up) Chain 1: 4.682 seconds (Sampling) Chain 1: 12.937 seconds (Total) Chain 1: SAMPLING FOR MODEL 'matrixModel' NOW (CHAIN 2). Chain 2: Chain 2: Gradient evaluation took 0 seconds Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. Chain 2: Adjust your expectations accordingly! Chain 2: Chain 2: Chain 2: Iteration: 1 / 4500 [ 0%] (Warmup) Chain 2: Iteration: 450 / 4500 [ 10%] (Warmup) Chain 2: Iteration: 900 / 4500 [ 20%] (Warmup) Chain 2: Iteration: 1350 / 4500 [ 30%] (Warmup) Chain 2: Iteration: 1800 / 4500 [ 40%] (Warmup) Chain 2: Iteration: 2250 / 4500 [ 50%] (Warmup) Chain 2: Iteration: 2700 / 4500 [ 60%] (Warmup) Chain 2: Iteration: 3001 / 4500 [ 66%] (Sampling) Chain 2: Iteration: 3450 / 4500 [ 76%] (Sampling) Chain 2: Iteration: 3900 / 4500 [ 86%] (Sampling) Chain 2: Iteration: 4350 / 4500 [ 96%] (Sampling) Chain 2: Iteration: 4500 / 4500 [100%] (Sampling) Chain 2: Chain 2: Elapsed Time: 8.74 seconds (Warm-up) Chain 2: 5.977 seconds (Sampling) Chain 2: 14.717 seconds (Total) Chain 2: > > print(data.nest.rstan.m, par = c("alpha", "sigma", "sigma_B")) Inference for Stan model: matrixModel. 2 chains, each with iter=4500; warmup=3000; thin=1; post-warmup draws per chain=1500, total post-warmup draws=3000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat alpha[1] 42.16 0.20 5.65 31.37 38.49 42.17 45.58 52.75 771 1 alpha[2] 27.40 0.30 7.96 11.55 22.56 27.49 32.23 42.92 704 1 alpha[3] 41.11 0.26 7.92 25.22 36.27 41.25 46.15 56.31 908 1 sigma 5.03 0.01 0.31 4.46 4.83 5.01 5.23 5.67 1848 1 sigma_B 11.71 0.09 2.71 7.72 9.89 11.28 13.00 18.46 926 1 Samples were drawn using NUTS(diag_e) at Fri Dec 11 09:12:48 2020. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). > > data.nest.rstan.m.df <-as.data.frame(extract(data.nest.rstan.m)) > head(data.nest.rstan.m.df) alpha.1 alpha.2 alpha.3 sigma sigma_B lp__ 1 33.94608 30.18616 45.50708 5.005223 14.545831 -357.5156 2 38.68898 23.87072 38.06314 5.022923 9.121265 -354.3067 3 42.52249 31.77733 25.79672 5.520427 13.070609 -364.6733 4 42.87262 32.61743 34.14891 4.884315 12.579670 -356.7221 5 40.65653 25.51010 47.30654 4.735719 16.704468 -361.4840 6 49.58963 17.14219 35.23524 4.610314 11.579346 -355.0129 ## Hierarchical parameterisation > rstanString3=" + data{ + int n; + int nA; + int nSites; + vector [n] y; + matrix [nSites,nA] X; + matrix [n,nSites] Z; + } + + parameters{ + vector[nA] beta; + vector[nSites] gamma; + real<lower=0> sigma; + real<lower=0> sigma_S; + + } + + model{ + vector [n] mu_site; + vector [nSites] mu; + + // Priors + beta ~ normal( 0 , 1000 ); + gamma ~ normal( 0 , 1000 ); + sigma ~ cauchy( 0 , 25 ); + sigma_S~ cauchy( 0 , 25 ); + + mu_site = Z*gamma; + y ~ normal( mu_site , sigma ); + mu = X*beta; + gamma ~ normal(mu, sigma_S); + } + + " > > ## write the model to a text file > writeLines(rstanString3, con = "hierarchicalModel.stan") Arrange the data as a list (as required by STAN). As input, STAN will need to be supplied with: the response variable, the predictor matrix, the number of predictors, the total number of observed items. This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses. > dt.A <- ddply(data.nest,~Sites,catcolwise(unique)) > X<-model.matrix(~A, dt.A) > Z<-model.matrix(~Sites-1, data.nest) > data.nest.list <- list(y=data.nest$y, X=X, Z=Z, n=nrow(data.nest),
+   nSites=nrow(X),nA=ncol(X))

Define the nodes (parameters and derivatives) to monitor and the chain parameters.

> params <- c("beta","sigma","sigma_S")
> burnInSteps = 3000
> nChains = 2
> numSavedSteps = 3000
> thinSteps = 1
> nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)

Now run the STAN code via the rstan interface.

> data.nest.rstan.h <- stan(data = data.nest.list, file = "hierarchicalModel.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)

SAMPLING FOR MODEL 'hierarchicalModel' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
Chain 1:
Chain 1:  Elapsed Time: 0.761 seconds (Warm-up)
Chain 1:                0.317 seconds (Sampling)
Chain 1:                1.078 seconds (Total)
Chain 1:

SAMPLING FOR MODEL 'hierarchicalModel' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2:
Chain 2:
Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
Chain 2:
Chain 2:  Elapsed Time: 0.718 seconds (Warm-up)
Chain 2:                0.333 seconds (Sampling)
Chain 2:                1.051 seconds (Total)
Chain 2:
>
> print(data.nest.rstan.h, par = c("beta", "sigma", "sigma_S"))
Inference for Stan model: hierarchicalModel.
2 chains, each with iter=4500; warmup=3000; thin=1;
post-warmup draws per chain=1500, total post-warmup draws=3000.

mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta[1] 42.29    0.11 5.32 31.76 38.75 42.34 45.78 52.66  2552    1
beta[2] 27.53    0.15 7.52 12.26 22.77 27.29 32.41 42.24  2637    1
beta[3] 40.96    0.15 7.68 26.05 36.02 40.77 46.02 56.51  2719    1
sigma    5.05    0.01 0.30  4.51  4.84  5.04  5.23  5.68  2990    1
sigma_S 11.61    0.06 2.79  7.77  9.58 11.14 12.95 18.55  2473    1

Samples were drawn using NUTS(diag_e) at Fri Dec 11 09:13:20 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
>
> data.nest.rstan.h.df <-as.data.frame(extract(data.nest.rstan.h))
beta.1   beta.2   beta.3    sigma   sigma_S      lp__
1 40.18971 36.18208 40.68880 4.860507  8.009102 -354.4892
2 31.50034 41.79442 50.65135 4.933646 12.951002 -364.8473
3 37.78913 35.81777 47.67498 5.196665  9.753561 -355.2075
4 46.70426 24.76463 44.75701 4.907866 11.683086 -359.7312
5 44.43432 24.81058 38.15300 5.117181 10.045637 -352.9194
6 44.37251 19.98415 33.45183 4.833427  9.668305 -353.2098

If you want to include finite-population standard deviations in the model you can use the following code.

> rstanString4="
+ data{
+    int n;
+    int nA;
+    int nSites;
+    vector [n] y;
+    matrix [nSites,nA] X;
+    matrix [n,nSites] Z;
+ }
+
+ parameters{
+    vector[nA] beta;
+    vector[nSites] gamma;
+    real<lower=0> sigma;
+    real<lower=0> sigma_S;
+
+ }
+
+ model{
+     vector [n] mu_site;
+     vector [nSites] mu;
+
+     // Priors
+     beta ~ normal( 0 , 1000 );
+     gamma ~ normal( 0 , 1000 );
+     sigma ~ cauchy( 0 , 25 );
+     sigma_S~ cauchy( 0 , 25 );
+
+     mu_site = Z*gamma;
+     y ~ normal( mu_site , sigma );
+     mu = X*beta;
+     gamma ~ normal(mu, sigma_S);
+ }
+
+ generated quantities {
+     vector [n] mu_site;
+     vector [nSites] mu;
+     vector [n] y_err;
+     real sd_y;
+     vector [nSites] mu_site_err;
+     real sd_site;
+     real sd_A;
+
+     mu_site = Z*gamma;
+     y_err = mu_site - y;
+     sd_y = sd(y_err);
+
+     mu = X*beta;
+     mu_site_err = mu - gamma;
+     sd_site = sd(mu_site_err);
+
+     sd_A = sd(beta);
+ }
+
+ "
>
> ## write the model to a text file
> writeLines(rstanString4, con = "SDModel.stan")
>
> #data list
> dt.A <- ddply(data.nest,~Sites,catcolwise(unique))
> X<-model.matrix(~A, dt.A)
> Z<-model.matrix(~Sites-1, data.nest)
> data.nest.list <- list(y=data.nest$y, X=X, Z=Z, n=nrow(data.nest), + nSites=nrow(X),nA=ncol(X)) > > #parameters and chain details > params <- c('beta','sigma','sigma_S','sd_A','sd_site','sd_y') > adaptSteps = 1000 > burnInSteps = 3000 > nChains = 2 > numSavedSteps = 3000 > thinSteps = 1 > nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains) > > data.nest.rstan.SD <- stan(data = data.nest.list, file = "SDModel.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps) SAMPLING FOR MODEL 'SDModel' NOW (CHAIN 1). Chain 1: Chain 1: Gradient evaluation took 0 seconds Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. Chain 1: Adjust your expectations accordingly! Chain 1: Chain 1: Chain 1: Iteration: 1 / 4500 [ 0%] (Warmup) Chain 1: Iteration: 450 / 4500 [ 10%] (Warmup) Chain 1: Iteration: 900 / 4500 [ 20%] (Warmup) Chain 1: Iteration: 1350 / 4500 [ 30%] (Warmup) Chain 1: Iteration: 1800 / 4500 [ 40%] (Warmup) Chain 1: Iteration: 2250 / 4500 [ 50%] (Warmup) Chain 1: Iteration: 2700 / 4500 [ 60%] (Warmup) Chain 1: Iteration: 3001 / 4500 [ 66%] (Sampling) Chain 1: Iteration: 3450 / 4500 [ 76%] (Sampling) Chain 1: Iteration: 3900 / 4500 [ 86%] (Sampling) Chain 1: Iteration: 4350 / 4500 [ 96%] (Sampling) Chain 1: Iteration: 4500 / 4500 [100%] (Sampling) Chain 1: Chain 1: Elapsed Time: 0.696 seconds (Warm-up) Chain 1: 0.316 seconds (Sampling) Chain 1: 1.012 seconds (Total) Chain 1: SAMPLING FOR MODEL 'SDModel' NOW (CHAIN 2). Chain 2: Chain 2: Gradient evaluation took 0 seconds Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. Chain 2: Adjust your expectations accordingly! Chain 2: Chain 2: Chain 2: Iteration: 1 / 4500 [ 0%] (Warmup) Chain 2: Iteration: 450 / 4500 [ 10%] (Warmup) Chain 2: Iteration: 900 / 4500 [ 20%] (Warmup) Chain 2: Iteration: 1350 / 4500 [ 30%] (Warmup) Chain 2: Iteration: 1800 / 4500 [ 40%] (Warmup) Chain 2: Iteration: 2250 / 4500 [ 50%] (Warmup) Chain 2: Iteration: 2700 / 4500 [ 60%] (Warmup) Chain 2: Iteration: 3001 / 4500 [ 66%] (Sampling) Chain 2: Iteration: 3450 / 4500 [ 76%] (Sampling) Chain 2: Iteration: 3900 / 4500 [ 86%] (Sampling) Chain 2: Iteration: 4350 / 4500 [ 96%] (Sampling) Chain 2: Iteration: 4500 / 4500 [100%] (Sampling) Chain 2: Chain 2: Elapsed Time: 0.731 seconds (Warm-up) Chain 2: 0.32 seconds (Sampling) Chain 2: 1.051 seconds (Total) Chain 2: > > print(data.nest.rstan.SD, par = c('beta','sigma','sigma_S','sd_A','sd_site','sd_y')) Inference for Stan model: SDModel. 2 chains, each with iter=4500; warmup=3000; thin=1; post-warmup draws per chain=1500, total post-warmup draws=3000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta[1] 42.24 0.11 5.36 31.51 38.96 42.28 45.54 52.77 2361 1 beta[2] 27.42 0.15 7.52 11.92 23.03 27.50 32.20 41.74 2508 1 beta[3] 40.91 0.15 7.59 26.02 36.09 41.04 45.56 56.39 2707 1 sigma 5.04 0.01 0.31 4.46 4.82 5.01 5.23 5.69 3342 1 sigma_S 11.61 0.05 2.69 7.72 9.68 11.13 13.03 18.14 2421 1 sd_A 10.20 0.12 4.48 2.69 7.22 9.64 12.73 20.54 1445 1 sd_site 10.68 0.03 1.11 9.24 9.96 10.43 11.14 13.51 1139 1 sd_y 4.99 0.00 0.10 4.85 4.92 4.98 5.05 5.22 1337 1 Samples were drawn using NUTS(diag_e) at Fri Dec 11 09:13:52 2020. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). > > data.nest.rstan.SD.df <-as.data.frame(extract(data.nest.rstan.SD)) > head(data.nest.rstan.SD.df) beta.1 beta.2 beta.3 sigma sigma_S sd_A sd_site sd_y 1 40.43674 29.25310 45.21689 4.663609 9.517728 8.193149 9.983161 4.929392 2 49.89501 15.10486 29.68431 4.579810 9.530417 17.470868 10.813172 4.996209 3 37.77238 35.19504 46.26184 5.175441 11.006254 5.790620 9.896594 5.267506 4 36.21905 33.77648 49.31786 4.644749 9.820680 8.357426 11.617789 5.059987 5 43.52928 24.90145 40.77302 5.411379 7.547366 10.054018 9.158198 5.074078 6 46.50217 23.16053 34.67047 4.614323 10.378371 11.671189 10.579336 4.939073 lp__ 1 -352.2536 2 -356.7218 3 -361.6852 4 -360.1793 5 -357.3132 6 -354.1919 # Data generation - second example Now imagine a similar experiment in which we intend to measure a response ($$y$$) to one of treatments (three levels; “a1,” “a2” and “a3”). As with the previous design, we decided to establish a nested design in which there are sub-replicate ($$1$$m Quadrats) within each Site. In the current design, we have decided to further sub-replicate. Within each of the $$5$$ Quadrats, we are going to randomly place $$2\times10$$cm pit traps. Now we have Sites nested within Treatments, Quadrats nested within Sites AND, Pits nested within Sites. The latter of these (Pits nested within Sites) are the observations ($$y$$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. > set.seed(123) > nTreat <- 3 > nSites <- 15 > nSitesPerTreat <- nSites/nTreat > nQuads <- 5 > nPits <- 2 > site.sigma <- 10 # sd within between sites within treatment > quad.sigma <- 10 > sigma <- 7.5 > n <- nSites * nQuads * nPits > sites <- gl(n=nSites,n/nSites,n, lab=paste("site",1:nSites)) > A <- gl(nTreat, n/nTreat, n, labels=c('a1','a2','a3')) > a.means <- c(40,70,80) > > #A<-gl(nTreat,nSites/nTreat,nSites,labels=c('a1','a2','a3')) > a<-gl(nTreat,1,nTreat,labels=c('a1','a2','a3')) > a.X <- model.matrix(~a, expand.grid(a)) > a.eff <- as.vector(solve(a.X,a.means)) > site.means <- rnorm(nSites,a.X %*% a.eff,site.sigma) > > A <- gl(nTreat,nSites/nTreat,nSites,labels=c('a1','a2','a3')) > A.X <- model.matrix(~A, expand.grid(A)) > #a.X <- model.matrix(~A, expand.grid(A=gl(nTreat,nSites/nTreat,nSites,labels=c('a1','a2','a3')))) > site.means <- rnorm(nSites,A.X %*% a.eff,site.sigma) > > SITES <- gl(nSites,(nSites*nQuads)/nSites,nSites*nQuads,labels=paste('site',1:nSites)) > sites.X <- model.matrix(~SITES-1) > quad.means <- rnorm(nSites*nQuads,sites.X %*% site.means,quad.sigma) > > #SITES <- gl(nSites,1,nSites,labels=paste('site',1:nSites)) > #sites.X <- model.matrix(~SITES-1) > #quad.means <- rnorm(nSites*nQuads,sites.X %*% site.means,quad.sigma) > > QUADS <- gl(nSites*nQuads,n/(nSites*nQuads),n,labels=paste('quad',1:(nSites*nQuads))) > quads.X <- model.matrix(~QUADS-1) > #quads.eff <- as.vector(solve(quads.X,quad.means)) > #pit.means <- rnorm(n,quads.eff %*% t(quads.X),sigma) > pit.means <- rnorm(n,quads.X %*% quad.means,sigma) > > PITS <- gl(nPits*nSites*nQuads,1, n, labels=paste('pit',1:(nPits*nSites*nQuads))) > data.nest1<-data.frame(Pits=PITS,Quads=QUADS,Sites=rep(SITES,each=2), A=rep(A,each=nQuads*nPits),y=pit.means) > #data.nest1<-data.nest1[order(data.nest1$A,data.nest1$Sites,data.nest1$Quads),]
> head(data.nest1)  #print out the first six rows of the data set
1 pit 1 quad 1 site 1 a1 61.79607
2 pit 2 quad 1 site 1 a1 56.24699
3 pit 3 quad 2 site 1 a1 42.40885
4 pit 4 quad 2 site 1 a1 52.06672
5 pit 5 quad 3 site 1 a1 73.71286
6 pit 6 quad 3 site 1 a1 62.50529
>
> ggplot(data.nest1, aes(y=y, x=1)) + geom_boxplot() + facet_grid(.~Quads)

## Exploratory data analysis

Normality and Homogeneity of variance

> #Effects of treatment
> boxplot(y~A, ddply(data.nest1, ~A+Sites,numcolwise(mean, na.rm=T)))

>
> #Site effects
> boxplot(y~Sites, ddply(data.nest1, ~A+Sites+Quads,numcolwise(mean, na.rm=T)))

>
> boxplot(y~Quads, ddply(data.nest1, ~A+Sites+Quads+Pits,numcolwise(mean, na.rm=T)))

Conclusions:

• there is no evidence that the response variable is consistently non-normal across all populations - each boxplot is approximately symmetrical.

• there is no evidence that variance (as estimated by the height of the boxplots) differs between the five populations. More importantly, there is no evidence of a relationship between mean and variance - the height of boxplots does not increase with increasing position along the $$y$$-axis. Hence it there is no evidence of non-homogeneity.

• it is a little difficult to assess normality/homogeneity of variance of quadrats since there are only two pits per quadrat. Nevertheless, there is no suggestion that variance increases with increasing mean.

Obvious violations could be addressed either by:

• transform the scale of the response variables (to address normality, etc). Note transformations should be applied to the entire response variable (not just those populations that are skewed).

# Model fitting

## Frequentist for comparison

> library(nlme)
> d.lme <- lme(y ~ A, random=~1|Sites/Quads,data=data.nest1)
> summary(d.lme)
Linear mixed-effects model fit by REML
Data: data.nest1
AIC      BIC   logLik
1137.994 1155.937 -562.997

Random effects:
Formula: ~1 | Sites
(Intercept)
StdDev:    10.38248

Formula: ~1 | Quads %in% Sites
(Intercept) Residual
StdDev:    8.441615 7.161178

Fixed effects:  y ~ A
Value Std.Error DF  t-value p-value
(Intercept) 41.38646   5.04334 75 8.206160  0.0000
Aa2         21.36271   7.13236 12 2.995181  0.0112
Aa3         39.14584   7.13236 12 5.488483  0.0001
Correlation:
(Intr) Aa2
Aa2 -0.707
Aa3 -0.707  0.500

Standardized Within-Group Residuals:
Min          Q1         Med          Q3         Max
-2.11852493 -0.54600763 -0.03428569  0.53382444  2.26256381

Number of Observations: 150
Number of Groups:
15               75
>
> anova(d.lme)
numDF denDF  F-value p-value
(Intercept)     1    75 446.9152  <.0001
A               2    12  15.1037   5e-04

## Full effect parameterisation

> rstanString="
+ data{
+    int n;
+    int nSite;
+    vector [n] y;
+    int A2[n];
+    int A3[n];
+    int Site[n];
+ }
+
+ parameters{
+   real alpha0;
+   real alpha2;
+   real alpha3;
+   real<lower=0> sigma;
+   vector [nSite] beta_Site;
+   real<lower=0> sigma_Site;
+ }
+
+ model{
+     real mu[n];
+
+     // Priors
+     alpha0 ~ normal( 0 , 100 );
+     alpha2 ~ normal( 0 , 100 );
+     alpha3 ~ normal( 0 , 100 );
+     beta_Site~ normal( 0 , sigma_Site );
+     sigma_Site ~ cauchy( 0 , 25 );
+     sigma_Quad ~ cauchy( 0 , 25 );
+     sigma ~ cauchy( 0 , 25 );
+
+     for ( i in 1:n ) {
+         mu[i] = alpha0 + alpha2*A2[i] +
+     }
+     y ~ normal( mu , sigma );
+ }
+
+ "
>
> ## write the model to a text file
> writeLines(rstanString, con = "fullModel2.stan")
>
> A2 <- ifelse(data.nest1$A=='a2',1,0) > A3 <- ifelse(data.nest1$A=='a3',1,0)
> data.nest.list <- with(data.nest1, list(y=y, A2=A2, A3=A3, Site=as.numeric(Sites),
+    n=nrow(data.nest1), nSite=length(levels(Sites)),
>
> burnInSteps = 3000
> nChains = 2
> numSavedSteps = 3000
> thinSteps = 1
> nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)
>
> data.nest1.rstan.f <- stan(data = data.nest.list, file = "fullModel2.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)

SAMPLING FOR MODEL 'fullModel2' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
Chain 1:
Chain 1:  Elapsed Time: 6.208 seconds (Warm-up)
Chain 1:                5.699 seconds (Sampling)
Chain 1:                11.907 seconds (Total)
Chain 1:

SAMPLING FOR MODEL 'fullModel2' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2:
Chain 2:
Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
Chain 2:
Chain 2:  Elapsed Time: 6.936 seconds (Warm-up)
Chain 2:                4.237 seconds (Sampling)
Chain 2:                11.173 seconds (Total)
Chain 2:
>
> print(data.nest1.rstan.f, par = c('alpha0','alpha2','alpha3','sigma','sigma_Site', 'sigma_Quad'))
Inference for Stan model: fullModel2.
2 chains, each with iter=4500; warmup=3000; thin=1;
post-warmup draws per chain=1500, total post-warmup draws=3000.

mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
alpha0     41.38    0.15 5.37 30.55 37.98 41.41 44.77 52.25  1350    1
alpha2     21.39    0.19 7.75  6.39 16.32 21.37 26.38 36.77  1589    1
alpha3     39.26    0.22 8.14 23.39 34.32 39.16 44.34 55.86  1331    1
sigma       7.29    0.01 0.60  6.21  6.86  7.25  7.66  8.56  1980    1
sigma_Site 11.34    0.07 2.94  6.85  9.33 10.90 12.94 18.37  1899    1
sigma_Quad  8.52    0.03 1.13  6.38  7.74  8.45  9.27 10.77  1405    1

Samples were drawn using NUTS(diag_e) at Fri Dec 11 09:14:51 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
>
> data.nest1.rstan.f.df <-as.data.frame(extract(data.nest1.rstan.f))
alpha0   alpha2   alpha3    sigma sigma_Site sigma_Quad      lp__
1 45.17073 19.74204 35.44779 7.074432   8.002146   8.342248 -603.1710
2 36.74594 18.49752 48.94935 7.008208  10.685535   9.273360 -611.0495
3 38.31165 21.47771 49.82054 7.815280   8.589523   8.578517 -605.6948
4 38.67027 31.15159 39.55512 6.697657  13.424702   9.653050 -609.0281
5 34.00744 17.64337 45.82909 7.422725  11.481667  10.765264 -627.5098
6 40.61961 25.48412 38.63614 7.260439   8.503316   7.401149 -594.0790

## Matrix parameterisation

> rstanString2="
+ data{
+    int n;
+    int nSite;
+    int nA;
+    vector [n] y;
+    matrix [n,nA] X;
+    int Site[n];
+    vector [nA] a0;
+    matrix [nA,nA] A0;
+ }
+
+ parameters{
+   vector [nA] alpha;
+   real<lower=0> sigma;
+   vector [nSite] beta_Site;
+   real<lower=0> sigma_Site;
+ }
+
+ model{
+     real mu[n];
+
+     // Priors
+     //alpha ~ normal( 0 , 100 );
+     alpha ~ multi_normal(a0,A0);
+     beta_Site ~ normal( 0 , sigma_Site );
+     sigma_Site ~ cauchy( 0 , 25);
+     sigma_Quad ~ cauchy( 0 , 25);
+     sigma ~ cauchy( 0 , 25 );
+
+     for ( i in 1:n ) {
+     }
+     y ~ normal( mu , sigma );
+ }
+
+ "
>
> ## write the model to a text file
> writeLines(rstanString2, con = "matrixModel2.stan")
>
> X <- model.matrix(~A, data.nest)
> nA <- ncol(X)
> data.nest.list <- with(data.nest1, list(y=y, X=X, Site=as.numeric(Sites),
+    n=nrow(data.nest1), nSite=length(levels(Sites)),
+    nA=nA,
+    a0=rep(0,nA), A0=diag(100000,nA)))
>
> burnInSteps = 3000
> nChains = 2
> numSavedSteps = 3000
> thinSteps = 1
> nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)
>
> data.nest1.rstan.m2 <- stan(data = data.nest.list, file = "matrixModel2.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)

SAMPLING FOR MODEL 'matrixModel2' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
Chain 1:
Chain 1:  Elapsed Time: 9.665 seconds (Warm-up)
Chain 1:                12.864 seconds (Sampling)
Chain 1:                22.529 seconds (Total)
Chain 1:

SAMPLING FOR MODEL 'matrixModel2' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2:
Chain 2:
Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
Chain 2:
Chain 2:  Elapsed Time: 11.383 seconds (Warm-up)
Chain 2:                7.552 seconds (Sampling)
Chain 2:                18.935 seconds (Total)
Chain 2:
>
> print(data.nest1.rstan.m2, par = c('alpha','sigma','sigma_Site', 'sigma_Quad'))
Inference for Stan model: matrixModel2.
2 chains, each with iter=4500; warmup=3000; thin=1;
post-warmup draws per chain=1500, total post-warmup draws=3000.

mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
alpha[1]   41.35    0.15 5.97 28.78 37.74 41.51 45.02 53.05  1643 1.00
alpha[2]   21.37    0.25 8.25  5.77 16.01 21.27 26.55 37.46  1117 1.01
alpha[3]   39.19    0.22 8.42 22.50 33.77 39.03 44.54 55.95  1497 1.00
sigma       7.30    0.02 0.60  6.22  6.88  7.26  7.66  8.57  1465 1.00
sigma_Site 11.51    0.09 3.23  6.77  9.35 11.00 13.13 19.06  1240 1.00
sigma_Quad  8.52    0.03 1.15  6.41  7.72  8.47  9.26 10.92  1424 1.00

Samples were drawn using NUTS(diag_e) at Fri Dec 11 09:16:06 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
>
> data.nest1.rstan.m2.df <-as.data.frame(extract(data.nest1.rstan.m2))
alpha.1  alpha.2  alpha.3    sigma sigma_Site sigma_Quad      lp__
1 43.11339 25.20017 38.48582 6.425179   7.402032   7.991526 -610.6453
2 42.06806 21.99089 43.87533 8.086784  13.840811   9.164063 -640.4495
3 48.75651 18.25023 31.56387 7.253844  12.568228  11.137956 -620.1871
4 47.97542 27.88142 33.31643 6.983402  18.774030   9.553736 -617.8734
5 37.97879 37.97889 39.74896 7.316680  16.485006   9.229776 -629.8107
6 31.03042 26.85564 57.37791 6.391101  17.110902   7.842159 -596.1107

# References

Gelman, Andrew, Daniel Lee, and Jiqiang Guo. 2015. “Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization.” Journal of Educational and Behavioral Statistics 40 (5): 530–43.
Gelman, Andrew, and others. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34.
Stan Development Team. 2018. RStan: The R Interface to Stan.” http://mc-stan.org/.