We gathered a group for an exercise in experimental design. Our proposed experiment was to compare the effectiveness of different models of soft-tip darts. Our discussion included
Experimental unit. One round of three throws was determined to be an experimental unit; a participant threw three darts of the same model at the bullseye. The model of dart would be independently randomized for each round.
Assessment. We decided the primary measure of throwing accuracy would be distance from the bullseye, measured in cm. We also measured an angle, from -9 to 10, with the 20 point segment being 0, as measure of bias in throws. We also included the cumulative score of three darts, with the assumption that this variable would show a uniform distribution. We expect distance and angle to be approximately normally distributed.
Blocking. We had 10 participants, so decided to let each participant represent a block. Each would throw each of the models of darts in a randomized complete block design.
Since we have no preceding trials, we conducted a simple uniformity trial to determine variance components. We labelled ‘red’ and ‘blue’ as treatments, throwing the two sets of starter darts provided. Each participant threw darts for two rounds and treatments were independently randomized for each participant. We have two data tables. The first includes all three throws as samples, while the second records only round means. In the trial, we found that the darts had a tendency to not stay in the board. These were scored as missing values.
samples.dat <- data.frame(
plot=as.factor(c(101, 101, 101, 202, 202, 202, 301, 301, 301, 402, 402, 402, 501, 501, 501, 602, 602, 602, 701, 701, 701, 802, 802, 802, 901, 901, 901, 1002, 1002, 1002, 102, 102, 102, 201, 201, 201, 302, 302, 302, 401, 401, 401, 502, 502, 502, 601, 601, 601, 702, 702, 702, 801, 801, 801, 902, 902, 902, 1001, 1001, 1001)),
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)),
replicate=as.factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10)),
subsample =as.factor(c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)),
Bullseye=c(4, NA, NA, 7, NA, NA, 7, 15, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19, 16, NA, 6, NA, NA, 13, 6, NA, 6, NA, NA, 7, 4, NA, 8, 9, NA, 5, NA, NA, 16, NA, NA, 7, NA, NA, 13, 13, NA, 11, 13, 8, 8, NA, NA, 2, NA, NA, 7, 7, NA),
Angle=c(8, NA, NA, -6, NA, NA, 1, -1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -5, -9, NA, -2, NA, NA, 2, -4, NA, 2, NA, NA, 7, 1, NA, -7, 0, NA, 3, NA, NA, 1, NA, NA, 5, NA, NA, 6, 3, NA, -3, -9, 1, 0, NA, NA, 1, NA, NA, 3, -8, NA)
)
means.dat <- data.frame(
plot=as.factor(c(101, 202, 301, 402, 501, 602, 701, 802, 901, 1002, 102, 201, 302, 401, 502, 601, 702, 801, 902, 1001)),
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)),
replicate=as.factor(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)),
Bullseye=c(4, 7, 11, NA, NA, NA, 17.5, 6, 9.5, 6, 5.5, 8.5, 5, 16, 7, 13, 10.67, 8, 2, 7),
Angle=c(8, -6, 0, NA, NA, NA, -7, -2, -1, 2, 4, -3.5, 3, 1, 5, 4.5, -3.67, 0, 1, -2.5),
Score=c(2, 8, 6, NA, NA, NA, 38, 12, 32, 18, 16, 76, 4, 2, 6, 14, 29, 20, 25, 11)
)
Start with analysis of variance of the distance from bull. We can choose to analyze either data set:
summary(aov(Bullseye ~ replicate+treatment, data = means.dat))
## Df Sum Sq Mean Sq F value Pr(>F)
## replicate 9 198.47 22.052 2.223 0.172
## treatment 1 14.67 14.668 1.478 0.270
## Residuals 6 59.53 9.922
## 3 observations deleted due to missingness
summary(aov(Bullseye ~ replicate+treatment+Error(treatment:replicate), data = samples.dat))
## Warning in aov(Bullseye ~ replicate + treatment +
## Error(treatment:replicate), : Error() model is singular
##
## Error: treatment:replicate
## Df Sum Sq Mean Sq F value Pr(>F)
## replicate 9 268.79 29.87 2.064 0.195
## treatment 1 36.39 36.39 2.515 0.164
## Residuals 6 86.81 14.47
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 9 78.67 8.741
Note that in the second case, we need to test treatments and replicates against experimental error (treatment x replicate interaction), and not residual error. In this case, residual error represents sampling error. In another exercise, we might use this to determine an appropriate number of samples from each experimental unit, but for now, we’ll work the natural number of samples (dart games are played with three throws per round).
Now, consider a simple paired means t-test:
\[ t = \frac{\delta_{a-b}}{\sqrt{2\sigma^2/r}} \] where \(\delta_{a-b}\) is the difference between means \(a\) and \(b\), and \(\sigma^2\) is pooled standard deviation. To declare that means \(a\) and \(b\) are signficantly different, we require \[ t \ge t_{\alpha/2} \]
from some signficance \(\alpha\). In this exercise, we use \(\alpha = 0.1\), since the penalty for a Type I error with the darts is not great - none of the dart sets cost more than $20, so we would not lose a lot of money acting on the incorrect assumption that the most expensive model was ‘better’ in this trial.
We can rearrange the t-ratio to solve for the number of replicates \(r\), but first, we should note that we also want to control for Type II error - that is, rejecting a true difference. Denote this with a critical \(t_{\beta}\) and let \(\beta = 0.2\) for a power of 80%. This gives us the formula
\[ r \ge 2 \left[ t_{\alpha/2} + t_{\beta} \right]^2 \left( \frac{\sigma}{\delta} \right)^2 \] We could calculate \(\delta\) from mean differences, but the data are unbalanced, and we have a signficant replicate effect. One quick calculation is to use the treatment coefficient from the linear model,
print(delta <- tail(coef(lm(formula = Bullseye ~ replicate + treatment, data = means.dat)),1))
## treatment2
## -2.047143
Thus, taking values from the AOV table,
print(sigma <- summary(aov(Bullseye ~ replicate+treatment, data = means.dat))[[1]][3,3])
## [1] 9.921945
print(error.df <- summary(aov(Bullseye ~ replicate+treatment, data = means.dat))[[1]][3,1])
## [1] 6
print(t.alpha <- qt(1-0.1/2,df=error.df))
## [1] 1.94318
print(t.beta <- qt(0.20,df=error.df))
## [1] -0.9057033
print(r <- (2*(t.alpha + t.beta)^2) *((sigma/2)^2))
## [1] 52.98106
This suggests at least 50 replicates would be required to detect a difference of ~2 cm. However, this would generate a much larger error degrees of freedom and would change the critical t-values. If we recalculate:
error.df <- ceiling(r-1)
print(t.alpha <- qt(1-0.1/2,df=error.df))
## [1] 1.674689
print(t.beta <- qt(0.20,df=error.df))
## [1] -0.8485883
print(r <- (2*(t.alpha + t.beta)^2) *((sigma/2)^2))
## [1] 33.59153
We can further refine the required replicates by repeated iteration over replicates. ARM does this, as well as using a non-central distribution for \(t_{\beta}\). The ARM report is available here, along with trial data
Open the final trial protocol, select the Design tab under the Trial Settings dialog to calculate power for other combinations of variance (CV) and effect size (% Mean Difference). In ARM, we standardize these terms for simplicity.
In the end, we decided to use only 10 replicates, given time constraints. We hoped that the differences among darts would be greater than measured in the uniformity trial. We also added an assessment (Bounce) to count the number of darts that did stay in the board. Below are the data, again with and without samples. See also the ARM report and trial.
means.dat <- data.frame(
plot=as.factor(c(104, 201, 303, 401, 504, 603, 702, 803, 901, 1002, 103, 202, 304, 402, 503, 604, 703, 801, 903, 1001, 101, 203, 301, 404, 502, 601, 704, 802, 904, 1003, 102, 204, 302, 403, 501, 602, 701, 804, 902, 1004)),
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)),
replicate=as.factor(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)),
Bullseye=c(8.5, 8.5, NA, NA, NA, 5.5, 5, 6, NA, 15, 8.666667, 9, 11, 9, NA, 5, 6.5, 8, 7, 13, 4.666667, 10.666667, 12, 6.666667, 7.333333, 7, 7.666667, 5.333333, 6, NA, 5, 6.666667, 9.666667, 1.5, 6.333333, 6.5, 7, 10.666667, 11.333333, 2),
Angle=c(7.5, -3, NA, NA, NA, 8.5, -3, -3.5, NA, 9, 9, 0, -6.666667, 3.333333, NA, 3.5, 3, -8.5, 7.5, 8, -1, -6.666667, -4, 3.333333, 3, 4, -6, 2.666667, -5, NA, 2.333333, -1.333333, 1, -3, 1.666667, 0, -0.5, 1, 5, 5),
Score=c(9, 26, NA, NA, NA, 28, 9, 66, NA, 17, 30, 30, 49, 41, NA, 24, 31, 26, 27, 2, 40, 81, 36, 72, 41, 26, 45, 44, 36, NA, 37, 29, 44, 50, 34, 18, 58, 43, 20, 25),
Bounce=c(1, 1, 3, 3, 3, 1, 1, 1, 3, 2, 0, 0, 0, 0, 3, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 1, 0, 1, 1, 0, 0, 2)
)
samples.dat <- data.frame(
plot=as.factor(c(104, 104, 104, 201, 201, 201, 303, 303, 303, 401, 401, 401, 504, 504, 504, 603, 603, 603, 702, 702, 702, 803, 803, 803, 901, 901, 901, 1002, 1002, 1002, 103, 103, 103, 202, 202, 202, 304, 304, 304, 402, 402, 402, 503, 503, 503, 604, 604, 604, 703, 703, 703, 801, 801, 801, 903, 903, 903, 1001, 1001, 1001, 101, 101, 101, 203, 203, 203, 301, 301, 301, 404, 404, 404, 502, 502, 502, 601, 601, 601, 704, 704, 704, 802, 802, 802, 904, 904, 904, 1003, 1003, 1003, 102, 102, 102, 204, 204, 204, 302, 302, 302, 403, 403, 403, 501, 501, 501, 602, 602, 602, 701, 701, 701, 804, 804, 804, 902, 902, 902, 1004, 1004, 1004)),
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)),
replicate=as.factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10)),
subsample =as.factor(c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)),
Bullseye=c(5, 12, NA, 13, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, 9, NA, 5, NA, NA, 12, 0, NA, NA, NA, NA, 15, NA, NA, 10, 10, 6, 7, 11, NA, 8, 10, 15, 4, 10, 13, NA, NA, NA, 4, 6, NA, 6, 7, NA, 8, 8, NA, 7, 7, NA, 18, 8, NA, 3, 5, 6, 7, 10, 15, 9, 13, 14, 5, 5, 10, 6, 8, 8, 4, 10, NA, 3, 10, 10, 5, 6, 5, 6, 4, 8, NA, NA, NA, 6, 7, 2, 12, 5, 3, 5, 8, 16, 1, 2, NA, 5, 8, 6, 6, 7, NA, 4, 10, NA, 13, 9, 10, 6, 11, 17, 2, NA, NA),
Angle=c(5, 10, NA, -2, -4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7, 10, NA, -3, NA, NA, -7, 0, NA, NA, NA, NA, 9, NA, NA, 8, 9, 10, 2, -2, NA, -9, -8, -3, -1, 6, 5, NA, NA, NA, 5, 2, NA, 4, 2, NA, -8, -9, NA, 9, 6, NA, 8, 8, NA,
7, -3, -7, -4, -7, -9, -9, -2, -1, 6, 9, -5, 7, 7, -5, 0, 8, NA, -2, -6, -10, 9, 2, -3, -7, -6, -2, NA, NA, NA, 6, 8, -7, -5, -7, 8, 5, 5, -7, -2, -4, NA, 0, 1, 4, 6,
-6, NA, 6, -7, NA, 9, -1, -5, 9, 10, -4, 5, NA, NA)
)
Currently, I’m working on code to add diagnostics plots to ARM analysis. This is a prototype. We calculate the \(3^{rd}\) and \(4^{th}\) moments - skewness and kurtosis, and report those over a histrogram; we also compute a measure of normality and display that over a QQ-norm plot. Finally, we add a test for homogeneity of means, along with a box-whisker plot. The plotting function allows us to compare two sets of random variables.
moments <- function(x) {
grand.mean <- mean(x)
grand.N <- length(x)
grand.sd <- sqrt(sum((x- grand.mean)^2)/(grand.N-1))
skew <- (sum((x - grand.mean)^3))/(grand.sd^3)
skew <- skew * grand.N/((grand.N-1)*(grand.N-2))
kurt <- ((sum((x - grand.mean)^4))/(grand.sd^4))
kurt <- kurt * (grand.N*(grand.N+1))/((grand.N-1)*(grand.N-2)*(grand.N-3))
kurt <- kurt - (3*(grand.N-1)*(grand.N-1))/((grand.N-2)*(grand.N-3))
return(list(Mean=grand.mean,N=grand.N,SD=grand.sd,Skewness=skew,Kurtosis=kurt))
}
plot.comparisons <- function(vector1, vector2, group, main1="Data 1", main2="Data 2") {
moments1 <- moments(vector1)
moments2 <- moments(vector2)
lbl1.skew <- paste('Skewness = ', format(moments1$Skewness,digits=3))
lbl2.skew <- paste('Skewness = ', format(moments2$Skewness,digits=3))
lbl1.kur <- paste('Kurtosis = ', format(moments1$Kurtosis,digits=3))
lbl2.kur <- paste('Kurtosis = ', format(moments2$Kurtosis,digits=3))
lbl1.sw <- paste('Shapiro-Wilks p = ', format(shapiro.test(vector1)$p.value,digits=3))
lbl1.ks <- ''
lbl2.sw <- paste('Shapiro-Wilks p = ', format(shapiro.test(vector2)$p.value,digits=3))
lbl2.ks <- '' #todo flag for KS test?
lbl1.lev <- '' #todo - write levene's test
lbl2.lev <- ''
lbl1.bar <- paste('Bartlett p = ', format(bartlett.test(x=vector1,g=group)$p.value,digits=3))
lbl2.bar <- paste('Bartlett p = ', format(bartlett.test(x=vector2,g=group)$p.value,digits=3))
par(mfrow=c(3,2))
hist(vector1, main = main1, col='grey')
mtext(paste(lbl1.skew, lbl1.kur))
hist(vector2, main = main2, col='grey')
mtext(paste(lbl2.skew, lbl2.kur))
qqnorm(vector1)
qqline(vector1, col = 2)
mtext(paste(lbl1.sw, lbl1.ks))
qqnorm(vector2)
qqline(vector2, col = 'grey')
mtext(paste(lbl2.sw, lbl2.ks))
plot(x=group, y=vector1)
mtext(lbl1.bar)
plot(x=group, y=vector2)
mtext(lbl2.bar)
}
The design of the experiment implies an analysis of variance, assuming errors are i.i.d. We can check these assumptions by diagnostic plots of the residuals. Historically, ARM has tested the assmuptions of AOV on the observed plot means, so we include these in the diagnostics.
design.model <- lm(Bullseye ~ replicate+treatment, data = means.dat)
plot.comparisons(means.dat$Bullseye[!is.na(means.dat$Bullseye)], resid(design.model), means.dat$treatment[!is.na(means.dat$Bullseye)],
main1="Plot Means", main2="AOV Residuals")
There is some kurtosis in the residuals, so we might want to try a better analysis model. Since we have missing plots, we might have better results with a mixed-model analysis. Specifically, we have confounding among treatment effects and replicate effects (when plots are missing, blocks are no longer complete).
library(lme4)
## Loading required package: Matrix
mixed.model <- lmer(Bullseye ~ treatment + (1|replicate), data = means.dat)
plot.comparisons(resid(design.model), resid(mixed.model), means.dat$treatment[!is.na(means.dat$Bullseye)],
main1="AOV Residuals", main2="REML Residuals")
There is some improvement in the residuals, but this does not greatly affect inferences based on the two models:
summary(aov(design.model))
## Df Sum Sq Mean Sq F value Pr(>F)
## replicate 9 85.07 9.453 1.14 0.38
## treatment 3 18.42 6.140 0.74 0.54
## Residuals 21 174.20 8.295
## 6 observations deleted due to missingness
VarCorr(mixed.model)
## Groups Name Std.Dev.
## replicate (Intercept) 0.53454
## Residual 2.88922
anova(mixed.model)
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## treatment 3 18.733 6.2442 0.748
It might help to considering sampling error. We refit the AOV and REML models as
samples.dat <- subset(samples.dat,!is.na(samples.dat$Bullseye))
replicates <- length(unique(samples.dat$replicate))
treatments <- length(unique(samples.dat$treatment))
design.model <- lm(Bullseye ~ replicate+treatment+treatment:replicate, data = samples.dat)
mixed.model <- lmer(Bullseye ~ treatment + (1|replicate/treatment), data = samples.dat)
plot.comparisons(samples.dat$Bullseye,resid(design.model),samples.dat$treatment,
main1="Scores", main2="AOV Residuals")
plot.comparisons(resid(design.model), resid(mixed.model), samples.dat$treatment[!is.na(samples.dat$Bullseye)],
main1="AOV Residuals", main2="REML Residuals")
summary(aov(Bullseye ~ replicate+treatment+Error(treatment:replicate), data = samples.dat))
## Warning in aov(Bullseye ~ replicate + treatment +
## Error(treatment:replicate), : Error() model is singular
##
## Error: treatment:replicate
## Df Sum Sq Mean Sq F value Pr(>F)
## replicate 9 205.5 22.839 1.450 0.230
## treatment 3 21.8 7.269 0.461 0.712
## Residuals 21 330.8 15.751
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 48 647 13.48
VarCorr(mixed.model)
## Groups Name Std.Dev.
## treatment:replicate (Intercept) 0.66904
## replicate (Intercept) 0.94203
## Residual 3.72311
anova(mixed.model)
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## treatment 3 24.955 8.3185 0.6001
What does this tell us about our experimental design? On first reading, it would suggest that three samples per experimental unit is too few - the variance between rounds is dominated by the variance among throws within rounds. If we want the central limit theorem to work for us, we might choose more samples.
Next, we examine the linear model fit for Angle.
design.model <- lm(Angle ~ replicate+treatment, data = means.dat)
plot.comparisons(means.dat$Angle[!is.na(means.dat$Angle)], resid(design.model), means.dat$treatment[!is.na(means.dat$Angle)],
main1="Plot Means", main2="AOV Residuals")
summary(aov(design.model))
## Df Sum Sq Mean Sq F value Pr(>F)
## replicate 9 389.8 43.31 2.559 0.0366 *
## treatment 3 43.3 14.42 0.852 0.4811
## Residuals 21 355.4 16.93
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness
The most striking aspect of this analysis is that there does appear to be a bias for each participant - replicate effects are signficant. In the raw data, this introduces some heterogeneity among raw treatment variances, but this is reduced in the residuals. The residuals of the plot means.
plot(Angle ~ replicate,data = means.dat)
We can also see that sampling error is smaller for Angle, relative to experimental error. This, we might not need more throws, if our aim is to measure bias. Perhaps this implies that participants are more precise, but less accurate (Why?).
summary(aov(Angle ~ replicate+treatment+Error(treatment:replicate), data = samples.dat))
## Warning in aov(Angle ~ replicate + treatment +
## Error(treatment:replicate), : Error() model is singular
##
## Error: treatment:replicate
## Df Sum Sq Mean Sq F value Pr(>F)
## replicate 9 814.6 90.51 2.158 0.0705 .
## treatment 3 118.3 39.45 0.940 0.4388
## Residuals 21 880.8 41.94
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 48 1379 28.73
design.model <- lm(Score ~ replicate+treatment, data = means.dat)
plot.comparisons(means.dat$Score[!is.na(means.dat$Score)], resid(design.model), means.dat$treatment[!is.na(means.dat$Score)],
main1="Plot Totals", main2="AOV Residuals")
Surprisingly, Score appears to be different among the darts, and more expensive darts showing a higher scoring rate. There might be two explanations. The bull and double bull have higher scoring values than the segments just outside, so darts that give throwers a higher probablity of hitting a bullseye will result in higher scores. Less likely, darts that are within a smaller radius are more likely to hit a triple score than those in the outer ring. How might we design a new trial to test this? Can we detect these differences with the current data?
a priori, we might expect Bounce to follow a Poisson distribution. However, if there are sufficient observations, we can simplify and model as normal. First, we consider residuals from the linear model
design.model <- lm(Bounce ~ replicate+treatment, data = means.dat)
plot.comparisons(means.dat$Bounce,resid(design.model),means.dat$treatment,
main1="Plot Totals", main2="AOV Residuals")
poisson.model <- glm(Bounce ~ replicate+treatment, data = means.dat,family=poisson)
It does not appear that a Poisson model improves the analysis. Perhaps the greater number of 3 bounce vs 2 bounce creates a mismatch. Note - we don’t necessarily expect the QQ-norm plot to be linear, but two potential modes in the residual histogram is troubling.
plot.comparisons(resid(design.model),resid(poisson.model),means.dat$treatment,
main1="AOV Residuals", main2="GLM Residuals")