Introduction

Frequently in field trials, blocking is used to control for spatial variability among experimental units. When blocking is effective we use the designed statistical model for analysis and to make inferrences about treatment effects. However, it also happens that spatial variability occurs on a scale too fine to be captured by blocking or occurs most distinctly within blocks instead of between blocks. We say that blocking fails, and we might attempt to identify alternative models to account for field variability.

What criteria should we use to identify better models?

Preliminaries

Suppose, for the sake of discussion, we have a 6 treatment RCB trial in 4 replicates. We write a statistical model as

\[ y_{ij}= \mu + \tau_i +\rho_j + e_{ij} \] for $\tau_1, \dots, \tau_{T=6}$ treatments and $\rho_1, \dots, \rho_{R=4}$ replicates. For convenience, we write this in matrix form as

\[ \mathbf{y} = \mathbf{X} \mathbf{\beta} \] where $\mathbf{y} = \left\{ y_1, y_2, \dots y_{TR}\right\}$, $\mathbf{\beta} = \left\{\mu, \rho_1, \dots , \tau_1, \right\}$ and $ $ is a matrix of $0,1$ dummy variables indicating chich members of $\mathbf{\beta}$ correspond to members of $\mathbf{y}$. Least square estimates for $\left\{\mu, \rho_1, \dots , \tau_1, \right\}$ are found by solving

\[ \widehat{\mathbf{\beta}} = \mathbf{X}^- \mathbf{y} \] and predict values by

\[ \bar{\mathbf{y}} = \mathbf{X} \widehat{\mathbf{\beta}} \]

Most important for this discussion, we write the model Residual Sums of Squares as

\[ RSS = \sum\left( \mathbf{y} - \bar{\mathbf{y}} \right)^2 = \sum\left( \mathbf{y} -\mathbf{X} \widehat{\mathbf{\beta}} \right)^2 \]

We typically call treatment means derived from $ $ as least squares means because $ $ minimizes the residual sums of squares. However, these means may also be referred to as maximum likelihood estimates since this same $\widehat{\mathbf{\beta}}$ will maximize the likelihood function (in the fixed effects model), where the likelihood of $\widehat{\mathbf{\beta}}$ is given by

\[ L (\mathbf{y}; \mathbf{\beta}, \sigma^2) = \left(2 \pi \sigma^2\right)^{-n/2} e^{- \frac{\sum\left( \mathbf{y} -\mathbf{X} \widehat{\mathbf{\beta}} \right)^2}{2 \sigma^2}} \] It is computationally more convenient to refer to the log of the likelihood function (logLik), given by

\[ \ell (\mathbf{y}; \mathbf{\beta}, \sigma^2) = -\frac{n}{2} ln\left(2 \pi \right)-\frac{n}{2}ln\left(\sigma^2\right)- \frac{1} {2 \sigma^2} \sum \left( \mathbf{y} -\mathbf{X} \widehat{\mathbf{\beta}}\right)^2 \]

Likelihood will typicall become very small as the number of observations $n$ increases, as this distributions probabilities over a wide range of possible $\mathbf{y}$, so logLik will generally be negative.

While tests of likelihood can be used to compare models, it is more common to use information criteria to guide model selection. Two most common are AIC and BIC.

Akaike Information Criteria

(Akaike 1974) propose a metric that related log-likelihood to Kullback-Leibler information number (AKAIKE 1981) that he called An Information Criteria, although it is more commonly phrased as Akaike Information Criteria. The criteria adjusts lokLog to account for the number of parameters $k$ in a model (i.e. the number of members of $\mathbf{\beta}$)

We write AIC as

\[ AIC = 2k - \ell (\mathbf{y}; \mathbf{\beta}, \sigma^2) \] which in the case of least squares estimates can be simplified to

\[ AIC = 2k - n\text{ln}\left(RSS\right) \]

Thus, statistical models that reduce residual error will also tend to reduce AIC, while models with more parameters $k$ will increase AIC. Since we want to select a model that reduces error but does not over fit the data, the smallest AIC represents the best compromise between underfitting and overfitting.

Thus the rule of thumb - smaller or more negative AIC is better

BIC

(Schwarz 1978) proposed an alternate adjustment to the log-likelihood, termed Bayesian Information Criteria. We write BIC as

\[ BIC = \text{ln}\left(n\right)k - n\text{ln}\left(RSS\right) \] BIC is frequently related to Bayes Factors, which (roughly) are the ratios of likelihoods of two alternative models or hypothesis; in log terms this would be the dfference between two BIC. This leads us to rules of thumb (adapted from Jeffries) that allow us to select or reject among alernative models

$\Delta BIC$	Evidence against simpler model
0 to 2	Not worth more than a bare mention
2 to 6	Positive
6 to 10	Strong
$>10$	Very strong

Thus, we might choose to analyze data using a spatial model only if the BIC is a least $6$ units smaller than the RCB (design) model.

Examples

Littell 1996

This is a common example for spatial analysis (Littell et al. 1996).

ARM Trial
ARM Report

The linked report, using the Automoatic spatial model option, selects a cubic trend model; this is a marked improvement over the RBC analysis. This is not surprising - the blocking scheme might not be sufficient to capture spatial variability, as we see from the trial map

ARM Map

We might further examine other diagnostics:

ARM CST Plots

The spatial model improves upon the RCB model by reducing non-normality (Shapiro-Wilks p=0.171) and Skewness (p=0.913). Kurstosis is reduced but still signficant (p = 0.04), and Levene’s test does show heterogeneity, so we might yet choose to reject both the spatial and RCB models in favor of ranked based analysis.

We might also consider the Residual Maps and Spatial Biplots to guide trial maps for future experiments in this location.

Cochran Table 3.1

Cochran and Cox (Cochran and Cox 1957) use a trial involving nematode counts to illustrate the analysis of variance/covariance. These counts show possible spatial pattern, so we will consider spatial models for the treatment response (Column 2 - Second)

ARM Trial
ARM Report
ARM Diagnostics.

and related Knowledge Base page Spatial Correlation

We see, from the Diagnostics tab, that while the CST spatial model is a small improvement over IID, this is not enough to justify rejecting the design model. Further, none of the spatial models improve BIC by any non-trivial measure.

References

Akaike, Hirotogu. 1974. “A New Look at the Statistical Model Identification,” December, 1–8.

AKAIKE, Hirotugu. 1981. “Likelihood of a Model and Information Criteria,” September, 3–14.

Cochran, William G, and Gertrude M Cox. 1957. Experimental Design. Vol. 2. John Wiley; Sons, Inc.

Littell, R C, G A Milliken, W W Stroup, and R D Wolfinger. 1996. SAS System for Mixed Models. SAS Institute, Cary, NC.

Schwarz, Gideon. 1978. “Estimating the Dimension of a Model.” The Annals of Statistics, March.

Model Selection Criteria

Peter Claussen

2/3/2020