Introduction

We gathered a group for an exercise in experimental design. Our proposed experiment was to compare the effectiveness of different models of soft-tip darts. Our discussion included

  1. Experimental unit. One round of three throws was determined to be an experimental unit; a participant threw three darts of the same model at the bullseye. The model of dart would be independently randomized for each round.

  2. Assessment. We decided the primary measure of throwing accuracy would be distance from the bullseye, measured in cm. We also measured an angle, from -9 to 10, with the 20 point segment being 0, as measure of bias in throws. We also included the cumulative score of three darts, with the assumption that this variable would show a uniform distribution. We expect distance and angle to be approximately normally distributed.

  3. Blocking. We had 10 participants, so decided to let each participant represent a block. Each would throw each of the models of darts in a randomized complete block design.

  4. Treatments. We used a Fat Cat Model 727 electronic dartboard for this exercise. Two sets of soft tip ‘starter’ darts (three red and three blue) were provided with this board. We decided to use the starter darts as reference (check) treatments. Additional darts to be tested were

Uniformity Trial

Since we have no preceding trials, we conducted a simple uniformity trial to determine variance components. We labelled ‘red’ and ‘blue’ as treatments, throwing the two sets of starter darts provided. Each participant threw darts for two rounds and treatments were independently randomized for each participant. We have two data tables. The first includes all three throws as samples, while the second records only round means. In the trial, we found that the darts had a tendency to not stay in the board. These were scored as missing values.

samples.dat <- data.frame(
plot=as.factor(c(101, 101, 101, 202, 202, 202, 301, 301, 301, 402, 402, 402, 501, 501, 501, 602, 602, 602, 701, 701, 701, 802, 802, 802, 901, 901, 901, 1002, 1002, 1002, 102, 102, 102, 201, 201, 201, 302, 302, 302, 401, 401, 401, 502, 502, 502, 601, 601, 601, 702, 702, 702, 801, 801, 801, 902, 902, 902, 1001, 1001, 1001)), 
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), 
replicate=as.factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10)), 
subsample =as.factor(c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)),
Bullseye=c(4, NA, NA, 7, NA, NA, 7, 15, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19, 16, NA, 6, NA, NA, 13, 6, NA, 6, NA, NA, 7, 4, NA, 8, 9, NA, 5, NA, NA, 16, NA, NA, 7, NA, NA, 13, 13, NA, 11, 13, 8, 8, NA, NA, 2, NA, NA, 7, 7, NA),
Angle=c(8, NA, NA, -6, NA, NA, 1, -1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -5, -9, NA, -2, NA, NA, 2, -4, NA, 2, NA, NA, 7, 1, NA, -7, 0, NA, 3, NA, NA, 1, NA, NA, 5, NA, NA, 6, 3, NA, -3, -9, 1, 0, NA, NA, 1, NA, NA, 3, -8, NA)
)
means.dat <- data.frame(
plot=as.factor(c(101, 202, 301, 402, 501, 602, 701, 802, 901, 1002, 102, 201, 302, 401, 502, 601, 702, 801, 902, 1001)), 
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), 
replicate=as.factor(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), 
Bullseye=c(4, 7, 11, NA, NA, NA, 17.5, 6, 9.5, 6, 5.5, 8.5, 5, 16, 7, 13, 10.67, 8, 2, 7),
Angle=c(8, -6, 0, NA, NA, NA, -7, -2, -1, 2, 4, -3.5, 3, 1, 5, 4.5, -3.67, 0, 1, -2.5),
Score=c(2, 8, 6, NA, NA, NA, 38, 12, 32, 18, 16, 76, 4, 2, 6, 14, 29, 20, 25, 11)
)

Power calculations

Start with analysis of variance of the distance from bull. We can choose to analyze either data set:

summary(aov(Bullseye ~ replicate+treatment, data = means.dat))
##             Df Sum Sq Mean Sq F value Pr(>F)
## replicate    9 198.47  22.052   2.223  0.172
## treatment    1  14.67  14.668   1.478  0.270
## Residuals    6  59.53   9.922               
## 3 observations deleted due to missingness
summary(aov(Bullseye ~ replicate+treatment+Error(treatment:replicate), data = samples.dat))
## Warning in aov(Bullseye ~ replicate + treatment +
## Error(treatment:replicate), : Error() model is singular
## 
## Error: treatment:replicate
##           Df Sum Sq Mean Sq F value Pr(>F)
## replicate  9 268.79   29.87   2.064  0.195
## treatment  1  36.39   36.39   2.515  0.164
## Residuals  6  86.81   14.47               
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9  78.67   8.741

Note that in the second case, we need to test treatments and replicates against experimental error (treatment x replicate interaction), and not residual error. In this case, residual error represents sampling error. In another exercise, we might use this to determine an appropriate number of samples from each experimental unit, but for now, we’ll work the natural number of samples (dart games are played with three throws per round).

Now, consider a simple paired means t-test:

\[ t = \frac{\delta_{a-b}}{\sqrt{2\sigma^2/r}} \] where \(\delta_{a-b}\) is the difference between means \(a\) and \(b\), and \(\sigma^2\) is pooled standard deviation. To declare that means \(a\) and \(b\) are signficantly different, we require \[ t \ge t_{\alpha/2} \]

from some signficance \(\alpha\). In this exercise, we use \(\alpha = 0.1\), since the penalty for a Type I error with the darts is not great - none of the dart sets cost more than $20, so we would not lose a lot of money acting on the incorrect assumption that the most expensive model was ‘better’ in this trial.

We can rearrange the t-ratio to solve for the number of replicates \(r\), but first, we should note that we also want to control for Type II error - that is, rejecting a true difference. Denote this with a critical \(t_{\beta}\) and let \(\beta = 0.2\) for a power of 80%. This gives us the formula

\[ r \ge 2 \left[ t_{\alpha/2} + t_{\beta} \right]^2 \left( \frac{\sigma}{\delta} \right)^2 \] We could calculate \(\delta\) from mean differences, but the data are unbalanced, and we have a signficant replicate effect. One quick calculation is to use the treatment coefficient from the linear model,

print(delta <- tail(coef(lm(formula = Bullseye ~ replicate + treatment, data = means.dat)),1))
## treatment2 
##  -2.047143

Thus, taking values from the AOV table,

print(sigma <- summary(aov(Bullseye ~ replicate+treatment, data = means.dat))[[1]][3,3])
## [1] 9.921945
print(error.df <- summary(aov(Bullseye ~ replicate+treatment, data = means.dat))[[1]][3,1])
## [1] 6
print(t.alpha <- qt(1-0.1/2,df=error.df))
## [1] 1.94318
print(t.beta <- qt(0.20,df=error.df))
## [1] -0.9057033
print(r <- (2*(t.alpha + t.beta)^2) *((sigma/2)^2))
## [1] 52.98106

This suggests at least 50 replicates would be required to detect a difference of ~2 cm. However, this would generate a much larger error degrees of freedom and would change the critical t-values. If we recalculate:

error.df <- ceiling(r-1)
print(t.alpha <- qt(1-0.1/2,df=error.df))
## [1] 1.674689
print(t.beta <- qt(0.20,df=error.df))
## [1] -0.8485883
print(r <- (2*(t.alpha + t.beta)^2) *((sigma/2)^2))
## [1] 33.59153

We can further refine the required replicates by repeated iteration over replicates. ARM does this, as well as using a non-central distribution for \(t_{\beta}\). The ARM report is available here, along with trial data

Open the final trial protocol, select the Design tab under the Trial Settings dialog to calculate power for other combinations of variance (CV) and effect size (% Mean Difference). In ARM, we standardize these terms for simplicity.

Performance Trial

In the end, we decided to use only 10 replicates, given time constraints. We hoped that the differences among darts would be greater than measured in the uniformity trial. We also added an assessment (Bounce) to count the number of darts that did stay in the board. Below are the data, again with and without samples. See also the ARM report and trial.

means.dat <- data.frame(
plot=as.factor(c(104, 201, 303, 401, 504, 603, 702, 803, 901, 1002, 103, 202, 304, 402, 503, 604, 703, 801, 903, 1001, 101, 203, 301, 404, 502, 601, 704, 802, 904, 1003, 102, 204, 302, 403, 501, 602, 701, 804, 902, 1004)), 
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)), 
replicate=as.factor(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), 
Bullseye=c(8.5, 8.5, NA, NA, NA, 5.5, 5, 6, NA, 15, 8.666667, 9, 11, 9, NA, 5, 6.5, 8, 7, 13, 4.666667, 10.666667, 12, 6.666667, 7.333333, 7, 7.666667, 5.333333, 6, NA, 5, 6.666667, 9.666667, 1.5, 6.333333, 6.5, 7, 10.666667, 11.333333, 2),
Angle=c(7.5, -3, NA, NA, NA, 8.5, -3, -3.5, NA, 9, 9, 0, -6.666667, 3.333333, NA, 3.5, 3, -8.5, 7.5, 8, -1, -6.666667, -4, 3.333333, 3, 4, -6, 2.666667, -5, NA, 2.333333, -1.333333, 1, -3, 1.666667, 0, -0.5, 1, 5, 5),
Score=c(9, 26, NA, NA, NA, 28, 9, 66, NA, 17, 30, 30, 49, 41, NA, 24, 31, 26, 27, 2, 40, 81, 36, 72, 41, 26, 45, 44, 36, NA, 37, 29, 44, 50, 34, 18, 58, 43, 20, 25),
Bounce=c(1, 1, 3, 3, 3, 1, 1, 1, 3, 2, 0, 0, 0, 0, 3, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 1, 0, 1, 1, 0, 0, 2)
)

samples.dat <- data.frame(
plot=as.factor(c(104, 104, 104, 201, 201, 201, 303, 303, 303, 401, 401, 401, 504, 504, 504, 603, 603, 603, 702, 702, 702, 803, 803, 803, 901, 901, 901, 1002, 1002, 1002, 103, 103, 103, 202, 202, 202, 304, 304, 304, 402, 402, 402, 503, 503, 503, 604, 604, 604, 703, 703, 703, 801, 801, 801, 903, 903, 903, 1001, 1001, 1001, 101, 101, 101, 203, 203, 203, 301, 301, 301, 404, 404, 404, 502, 502, 502, 601, 601, 601, 704, 704, 704, 802, 802, 802, 904, 904, 904, 1003, 1003, 1003, 102, 102, 102, 204, 204, 204, 302, 302, 302, 403, 403, 403, 501, 501, 501, 602, 602, 602, 701, 701, 701, 804, 804, 804, 902, 902, 902, 1004, 1004, 1004)), 
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)), 
replicate=as.factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10)), 
subsample =as.factor(c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)), 
Bullseye=c(5, 12, NA, 13, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, 9, NA, 5, NA, NA, 12, 0, NA, NA, NA, NA, 15, NA, NA, 10, 10, 6, 7, 11, NA, 8, 10, 15, 4, 10, 13, NA, NA, NA, 4, 6, NA, 6, 7, NA, 8, 8, NA, 7, 7, NA, 18, 8, NA, 3, 5, 6, 7, 10, 15, 9, 13, 14, 5, 5, 10, 6, 8, 8, 4, 10, NA, 3, 10, 10, 5, 6, 5, 6, 4, 8, NA, NA, NA, 6, 7, 2, 12, 5, 3, 5, 8, 16, 1, 2, NA, 5, 8, 6, 6, 7, NA, 4, 10, NA, 13, 9, 10, 6, 11, 17, 2, NA, NA),
Angle=c(5, 10, NA, -2, -4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7, 10, NA, -3, NA, NA, -7, 0, NA, NA, NA, NA, 9, NA, NA, 8, 9, 10, 2, -2, NA, -9, -8, -3, -1, 6, 5, NA, NA, NA, 5, 2, NA, 4, 2, NA, -8, -9, NA, 9, 6, NA, 8, 8, NA,
7, -3, -7, -4, -7, -9, -9, -2, -1, 6, 9, -5, 7, 7, -5, 0, 8, NA, -2, -6, -10, 9, 2, -3, -7, -6, -2, NA, NA, NA, 6, 8, -7, -5, -7, 8, 5, 5, -7, -2, -4, NA, 0, 1, 4, 6,
-6, NA, 6, -7, NA, 9, -1, -5, 9, 10, -4, 5, NA, NA)
)

Currently, I’m working on code to add diagnostics plots to ARM analysis. This is a prototype. We calculate the \(3^{rd}\) and \(4^{th}\) moments - skewness and kurtosis, and report those over a histrogram; we also compute a measure of normality and display that over a QQ-norm plot. Finally, we add a test for homogeneity of means, along with a box-whisker plot. The plotting function allows us to compare two sets of random variables.

moments <- function(x) {
  grand.mean <- mean(x)
  grand.N <- length(x)
  grand.sd <- sqrt(sum((x- grand.mean)^2)/(grand.N-1))
  skew <- (sum((x - grand.mean)^3))/(grand.sd^3)
  skew <- skew * grand.N/((grand.N-1)*(grand.N-2))
  kurt <- ((sum((x - grand.mean)^4))/(grand.sd^4))
  kurt <- kurt * (grand.N*(grand.N+1))/((grand.N-1)*(grand.N-2)*(grand.N-3))
  kurt <- kurt - (3*(grand.N-1)*(grand.N-1))/((grand.N-2)*(grand.N-3))
  return(list(Mean=grand.mean,N=grand.N,SD=grand.sd,Skewness=skew,Kurtosis=kurt))
}

plot.comparisons <- function(vector1, vector2, group, main1="Data 1", main2="Data 2") {
  moments1 <- moments(vector1)
  moments2 <- moments(vector2) 
  lbl1.skew <- paste('Skewness = ', format(moments1$Skewness,digits=3))
  lbl2.skew <- paste('Skewness = ', format(moments2$Skewness,digits=3))
  lbl1.kur <- paste('Kurtosis = ', format(moments1$Kurtosis,digits=3))
  lbl2.kur <- paste('Kurtosis = ', format(moments2$Kurtosis,digits=3))
  lbl1.sw <- paste('Shapiro-Wilks p = ', format(shapiro.test(vector1)$p.value,digits=3))
  lbl1.ks <- ''
  lbl2.sw <- paste('Shapiro-Wilks p = ', format(shapiro.test(vector2)$p.value,digits=3))
  lbl2.ks <- '' #todo flag for KS test?
  lbl1.lev <- '' #todo - write levene's test
  lbl2.lev <- ''
  lbl1.bar <- paste('Bartlett p = ', format(bartlett.test(x=vector1,g=group)$p.value,digits=3))
  lbl2.bar <- paste('Bartlett p = ', format(bartlett.test(x=vector2,g=group)$p.value,digits=3))
  par(mfrow=c(3,2))
  hist(vector1, main = main1, col='grey')
  mtext(paste(lbl1.skew, lbl1.kur))
  hist(vector2, main = main2, col='grey')
  mtext(paste(lbl2.skew, lbl2.kur))
  qqnorm(vector1)
  qqline(vector1, col = 2)
  mtext(paste(lbl1.sw, lbl1.ks))
  qqnorm(vector2)
  qqline(vector2, col = 'grey')
  mtext(paste(lbl2.sw, lbl2.ks))
  plot(x=group, y=vector1)
  mtext(lbl1.bar)
  plot(x=group, y=vector2)
  mtext(lbl2.bar)
}

Bullseye

Linear model

The design of the experiment implies an analysis of variance, assuming errors are i.i.d. We can check these assumptions by diagnostic plots of the residuals. Historically, ARM has tested the assmuptions of AOV on the observed plot means, so we include these in the diagnostics.

design.model <- lm(Bullseye ~ replicate+treatment, data = means.dat)
plot.comparisons(means.dat$Bullseye[!is.na(means.dat$Bullseye)], resid(design.model), means.dat$treatment[!is.na(means.dat$Bullseye)],
                 main1="Plot Means", main2="AOV Residuals")

There is some kurtosis in the residuals, so we might want to try a better analysis model. Since we have missing plots, we might have better results with a mixed-model analysis. Specifically, we have confounding among treatment effects and replicate effects (when plots are missing, blocks are no longer complete).

Mixed model

library(lme4)
## Loading required package: Matrix
mixed.model <- lmer(Bullseye ~ treatment + (1|replicate), data = means.dat)
plot.comparisons(resid(design.model), resid(mixed.model), means.dat$treatment[!is.na(means.dat$Bullseye)],
                 main1="AOV Residuals", main2="REML Residuals")

There is some improvement in the residuals, but this does not greatly affect inferences based on the two models:

summary(aov(design.model))
##             Df Sum Sq Mean Sq F value Pr(>F)
## replicate    9  85.07   9.453    1.14   0.38
## treatment    3  18.42   6.140    0.74   0.54
## Residuals   21 174.20   8.295               
## 6 observations deleted due to missingness
VarCorr(mixed.model)
##  Groups    Name        Std.Dev.
##  replicate (Intercept) 0.53454 
##  Residual              2.88922
anova(mixed.model)
## Analysis of Variance Table
##           Df Sum Sq Mean Sq F value
## treatment  3 18.733  6.2442   0.748

Sampling error

It might help to considering sampling error. We refit the AOV and REML models as

samples.dat <- subset(samples.dat,!is.na(samples.dat$Bullseye))
replicates <- length(unique(samples.dat$replicate))
treatments <- length(unique(samples.dat$treatment))
design.model <- lm(Bullseye ~ replicate+treatment+treatment:replicate, data = samples.dat)
mixed.model <- lmer(Bullseye ~ treatment + (1|replicate/treatment), data = samples.dat)
plot.comparisons(samples.dat$Bullseye,resid(design.model),samples.dat$treatment,
                 main1="Scores", main2="AOV Residuals")

plot.comparisons(resid(design.model), resid(mixed.model), samples.dat$treatment[!is.na(samples.dat$Bullseye)],
                 main1="AOV Residuals", main2="REML Residuals")

summary(aov(Bullseye ~ replicate+treatment+Error(treatment:replicate), data = samples.dat))
## Warning in aov(Bullseye ~ replicate + treatment +
## Error(treatment:replicate), : Error() model is singular
## 
## Error: treatment:replicate
##           Df Sum Sq Mean Sq F value Pr(>F)
## replicate  9  205.5  22.839   1.450  0.230
## treatment  3   21.8   7.269   0.461  0.712
## Residuals 21  330.8  15.751               
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 48    647   13.48
VarCorr(mixed.model)
##  Groups              Name        Std.Dev.
##  treatment:replicate (Intercept) 0.66904 
##  replicate           (Intercept) 0.94203 
##  Residual                        3.72311
anova(mixed.model)
## Analysis of Variance Table
##           Df Sum Sq Mean Sq F value
## treatment  3 24.955  8.3185  0.6001

What does this tell us about our experimental design? On first reading, it would suggest that three samples per experimental unit is too few - the variance between rounds is dominated by the variance among throws within rounds. If we want the central limit theorem to work for us, we might choose more samples.

Angle

Linear model

Next, we examine the linear model fit for Angle.

design.model <- lm(Angle ~ replicate+treatment, data = means.dat)
plot.comparisons(means.dat$Angle[!is.na(means.dat$Angle)], resid(design.model), means.dat$treatment[!is.na(means.dat$Angle)],
                 main1="Plot Means", main2="AOV Residuals")

summary(aov(design.model))
##             Df Sum Sq Mean Sq F value Pr(>F)  
## replicate    9  389.8   43.31   2.559 0.0366 *
## treatment    3   43.3   14.42   0.852 0.4811  
## Residuals   21  355.4   16.93                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness

The most striking aspect of this analysis is that there does appear to be a bias for each participant - replicate effects are signficant. In the raw data, this introduces some heterogeneity among raw treatment variances, but this is reduced in the residuals. The residuals of the plot means.

plot(Angle ~ replicate,data = means.dat)

We can also see that sampling error is smaller for Angle, relative to experimental error. This, we might not need more throws, if our aim is to measure bias. Perhaps this implies that participants are more precise, but less accurate (Why?).

summary(aov(Angle ~ replicate+treatment+Error(treatment:replicate), data = samples.dat))
## Warning in aov(Angle ~ replicate + treatment +
## Error(treatment:replicate), : Error() model is singular
## 
## Error: treatment:replicate
##           Df Sum Sq Mean Sq F value Pr(>F)  
## replicate  9  814.6   90.51   2.158 0.0705 .
## treatment  3  118.3   39.45   0.940 0.4388  
## Residuals 21  880.8   41.94                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 48   1379   28.73

Score

Linear model

design.model <- lm(Score ~ replicate+treatment, data = means.dat)
plot.comparisons(means.dat$Score[!is.na(means.dat$Score)], resid(design.model), means.dat$treatment[!is.na(means.dat$Score)],
                 main1="Plot Totals", main2="AOV Residuals")

Surprisingly, Score appears to be different among the darts, and more expensive darts showing a higher scoring rate. There might be two explanations. The bull and double bull have higher scoring values than the segments just outside, so darts that give throwers a higher probablity of hitting a bullseye will result in higher scores. Less likely, darts that are within a smaller radius are more likely to hit a triple score than those in the outer ring. How might we design a new trial to test this? Can we detect these differences with the current data?

Bounce

Linear model

a priori, we might expect Bounce to follow a Poisson distribution. However, if there are sufficient observations, we can simplify and model as normal. First, we consider residuals from the linear model

design.model <- lm(Bounce ~ replicate+treatment, data = means.dat)
plot.comparisons(means.dat$Bounce,resid(design.model),means.dat$treatment,
                 main1="Plot Totals", main2="AOV Residuals")

poisson.model <- glm(Bounce ~ replicate+treatment, data = means.dat,family=poisson)

It does not appear that a Poisson model improves the analysis. Perhaps the greater number of 3 bounce vs 2 bounce creates a mismatch. Note - we don’t necessarily expect the QQ-norm plot to be linear, but two potential modes in the residual histogram is troubling.

plot.comparisons(resid(design.model),resid(poisson.model),means.dat$treatment,
                 main1="AOV Residuals", main2="GLM Residuals")