# Introduction

We gathered a group for an exercise in experimental design. Our proposed experiment was to compare the effectiveness of different models of soft-tip darts. Our discussion included

1. Experimental unit. One round of three throws was determined to be an experimental unit; a participant threw three darts of the same model at the bullseye. The model of dart would be independently randomized for each round.

2. Assessment. We decided the primary measure of throwing accuracy would be distance from the bullseye, measured in cm. We also measured an angle, from -9 to 10, with the 20 point segment being 0, as measure of bias in throws. We also included the cumulative score of three darts, with the assumption that this variable would show a uniform distribution. We expect distance and angle to be approximately normally distributed.

3. Blocking. We had 10 participants, so decided to let each participant represent a block. Each would throw each of the models of darts in a randomized complete block design.

4. Treatments. We used a Fat Cat Model 727 electronic dartboard for this exercise. Two sets of soft tip ‘starter’ darts (three red and three blue) were provided with this board. We decided to use the starter darts as reference (check) treatments. Additional darts to be tested were

# Uniformity Trial

Since we have no preceding trials, we conducted a simple uniformity trial to determine variance components. We labelled ‘red’ and ‘blue’ as treatments, throwing the two sets of starter darts provided. Each participant threw darts for two rounds and treatments were independently randomized for each participant. We have two data tables. The first includes all three throws as samples, while the second records only round means. In the trial, we found that the darts had a tendency to not stay in the board. These were scored as missing values.

samples.dat <- data.frame(
plot=as.factor(c(101, 101, 101, 202, 202, 202, 301, 301, 301, 402, 402, 402, 501, 501, 501, 602, 602, 602, 701, 701, 701, 802, 802, 802, 901, 901, 901, 1002, 1002, 1002, 102, 102, 102, 201, 201, 201, 302, 302, 302, 401, 401, 401, 502, 502, 502, 601, 601, 601, 702, 702, 702, 801, 801, 801, 902, 902, 902, 1001, 1001, 1001)),
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)),
replicate=as.factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10)),
subsample =as.factor(c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)),
Bullseye=c(4, NA, NA, 7, NA, NA, 7, 15, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19, 16, NA, 6, NA, NA, 13, 6, NA, 6, NA, NA, 7, 4, NA, 8, 9, NA, 5, NA, NA, 16, NA, NA, 7, NA, NA, 13, 13, NA, 11, 13, 8, 8, NA, NA, 2, NA, NA, 7, 7, NA),
Angle=c(8, NA, NA, -6, NA, NA, 1, -1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -5, -9, NA, -2, NA, NA, 2, -4, NA, 2, NA, NA, 7, 1, NA, -7, 0, NA, 3, NA, NA, 1, NA, NA, 5, NA, NA, 6, 3, NA, -3, -9, 1, 0, NA, NA, 1, NA, NA, 3, -8, NA)
)
means.dat <- data.frame(
plot=as.factor(c(101, 202, 301, 402, 501, 602, 701, 802, 901, 1002, 102, 201, 302, 401, 502, 601, 702, 801, 902, 1001)),
treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)),
replicate=as.factor(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)),
Bullseye=c(4, 7, 11, NA, NA, NA, 17.5, 6, 9.5, 6, 5.5, 8.5, 5, 16, 7, 13, 10.67, 8, 2, 7),
Angle=c(8, -6, 0, NA, NA, NA, -7, -2, -1, 2, 4, -3.5, 3, 1, 5, 4.5, -3.67, 0, 1, -2.5),
Score=c(2, 8, 6, NA, NA, NA, 38, 12, 32, 18, 16, 76, 4, 2, 6, 14, 29, 20, 25, 11)
)

## Power calculations

Start with analysis of variance of the distance from bull. We can choose to analyze either data set:

summary(aov(Bullseye ~ replicate+treatment, data = means.dat))
##             Df Sum Sq Mean Sq F value Pr(>F)
## replicate    9 198.47  22.052   2.223  0.172
## treatment    1  14.67  14.668   1.478  0.270
## Residuals    6  59.53   9.922
## 3 observations deleted due to missingness
summary(aov(Bullseye ~ replicate+treatment+Error(treatment:replicate), data = samples.dat))
## Warning in aov(Bullseye ~ replicate + treatment +
## Error(treatment:replicate), : Error() model is singular
##
## Error: treatment:replicate
##           Df Sum Sq Mean Sq F value Pr(>F)
## replicate  9 268.79   29.87   2.064  0.195
## treatment  1  36.39   36.39   2.515  0.164
## Residuals  6  86.81   14.47
##
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9  78.67   8.741

Note that in the second case, we need to test treatments and replicates against experimental error (treatment x replicate interaction), and not residual error. In this case, residual error represents sampling error. In another exercise, we might use this to determine an appropriate number of samples from each experimental unit, but for now, we’ll work the natural number of samples (dart games are played with three throws per round).

Now, consider a simple paired means t-test:

$t = \frac{\delta_{a-b}}{\sqrt{2\sigma^2/r}}$ where $$\delta_{a-b}$$ is the difference between means $$a$$ and $$b$$, and $$\sigma^2$$ is pooled standard deviation. To declare that means $$a$$ and $$b$$ are signficantly different, we require $t \ge t_{\alpha/2}$

from some signficance $$\alpha$$. In this exercise, we use $$\alpha = 0.1$$, since the penalty for a Type I error with the darts is not great - none of the dart sets cost more than $20, so we would not lose a lot of money acting on the incorrect assumption that the most expensive model was ‘better’ in this trial. We can rearrange the t-ratio to solve for the number of replicates $$r$$, but first, we should note that we also want to control for Type II error - that is, rejecting a true difference. Denote this with a critical $$t_{\beta}$$ and let $$\beta = 0.2$$ for a power of 80%. This gives us the formula $r \ge 2 \left[ t_{\alpha/2} + t_{\beta} \right]^2 \left( \frac{\sigma}{\delta} \right)^2$ We could calculate $$\delta$$ from mean differences, but the data are unbalanced, and we have a signficant replicate effect. One quick calculation is to use the treatment coefficient from the linear model, print(delta <- tail(coef(lm(formula = Bullseye ~ replicate + treatment, data = means.dat)),1)) ## treatment2 ## -2.047143 Thus, taking values from the AOV table, print(sigma <- summary(aov(Bullseye ~ replicate+treatment, data = means.dat))[[1]][3,3]) ## [1] 9.921945 print(error.df <- summary(aov(Bullseye ~ replicate+treatment, data = means.dat))[[1]][3,1]) ## [1] 6 print(t.alpha <- qt(1-0.1/2,df=error.df)) ## [1] 1.94318 print(t.beta <- qt(0.20,df=error.df)) ## [1] -0.9057033 print(r <- (2*(t.alpha + t.beta)^2) *((sigma/2)^2)) ## [1] 52.98106 This suggests at least 50 replicates would be required to detect a difference of ~2 cm. However, this would generate a much larger error degrees of freedom and would change the critical t-values. If we recalculate: error.df <- ceiling(r-1) print(t.alpha <- qt(1-0.1/2,df=error.df)) ## [1] 1.674689 print(t.beta <- qt(0.20,df=error.df)) ## [1] -0.8485883 print(r <- (2*(t.alpha + t.beta)^2) *((sigma/2)^2)) ## [1] 33.59153 We can further refine the required replicates by repeated iteration over replicates. ARM does this, as well as using a non-central distribution for $$t_{\beta}$$. The ARM report is available here, along with trial data Open the final trial protocol, select the Design tab under the Trial Settings dialog to calculate power for other combinations of variance (CV) and effect size (% Mean Difference). In ARM, we standardize these terms for simplicity. # Performance Trial In the end, we decided to use only 10 replicates, given time constraints. We hoped that the differences among darts would be greater than measured in the uniformity trial. We also added an assessment (Bounce) to count the number of darts that did stay in the board. Below are the data, again with and without samples. See also the ARM report and trial. means.dat <- data.frame( plot=as.factor(c(104, 201, 303, 401, 504, 603, 702, 803, 901, 1002, 103, 202, 304, 402, 503, 604, 703, 801, 903, 1001, 101, 203, 301, 404, 502, 601, 704, 802, 904, 1003, 102, 204, 302, 403, 501, 602, 701, 804, 902, 1004)), treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)), replicate=as.factor(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), Bullseye=c(8.5, 8.5, NA, NA, NA, 5.5, 5, 6, NA, 15, 8.666667, 9, 11, 9, NA, 5, 6.5, 8, 7, 13, 4.666667, 10.666667, 12, 6.666667, 7.333333, 7, 7.666667, 5.333333, 6, NA, 5, 6.666667, 9.666667, 1.5, 6.333333, 6.5, 7, 10.666667, 11.333333, 2), Angle=c(7.5, -3, NA, NA, NA, 8.5, -3, -3.5, NA, 9, 9, 0, -6.666667, 3.333333, NA, 3.5, 3, -8.5, 7.5, 8, -1, -6.666667, -4, 3.333333, 3, 4, -6, 2.666667, -5, NA, 2.333333, -1.333333, 1, -3, 1.666667, 0, -0.5, 1, 5, 5), Score=c(9, 26, NA, NA, NA, 28, 9, 66, NA, 17, 30, 30, 49, 41, NA, 24, 31, 26, 27, 2, 40, 81, 36, 72, 41, 26, 45, 44, 36, NA, 37, 29, 44, 50, 34, 18, 58, 43, 20, 25), Bounce=c(1, 1, 3, 3, 3, 1, 1, 1, 3, 2, 0, 0, 0, 0, 3, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 1, 0, 1, 1, 0, 0, 2) ) samples.dat <- data.frame( plot=as.factor(c(104, 104, 104, 201, 201, 201, 303, 303, 303, 401, 401, 401, 504, 504, 504, 603, 603, 603, 702, 702, 702, 803, 803, 803, 901, 901, 901, 1002, 1002, 1002, 103, 103, 103, 202, 202, 202, 304, 304, 304, 402, 402, 402, 503, 503, 503, 604, 604, 604, 703, 703, 703, 801, 801, 801, 903, 903, 903, 1001, 1001, 1001, 101, 101, 101, 203, 203, 203, 301, 301, 301, 404, 404, 404, 502, 502, 502, 601, 601, 601, 704, 704, 704, 802, 802, 802, 904, 904, 904, 1003, 1003, 1003, 102, 102, 102, 204, 204, 204, 302, 302, 302, 403, 403, 403, 501, 501, 501, 602, 602, 602, 701, 701, 701, 804, 804, 804, 902, 902, 902, 1004, 1004, 1004)), treatment=as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)), replicate=as.factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10)), subsample =as.factor(c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)), Bullseye=c(5, 12, NA, 13, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, 9, NA, 5, NA, NA, 12, 0, NA, NA, NA, NA, 15, NA, NA, 10, 10, 6, 7, 11, NA, 8, 10, 15, 4, 10, 13, NA, NA, NA, 4, 6, NA, 6, 7, NA, 8, 8, NA, 7, 7, NA, 18, 8, NA, 3, 5, 6, 7, 10, 15, 9, 13, 14, 5, 5, 10, 6, 8, 8, 4, 10, NA, 3, 10, 10, 5, 6, 5, 6, 4, 8, NA, NA, NA, 6, 7, 2, 12, 5, 3, 5, 8, 16, 1, 2, NA, 5, 8, 6, 6, 7, NA, 4, 10, NA, 13, 9, 10, 6, 11, 17, 2, NA, NA), Angle=c(5, 10, NA, -2, -4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7, 10, NA, -3, NA, NA, -7, 0, NA, NA, NA, NA, 9, NA, NA, 8, 9, 10, 2, -2, NA, -9, -8, -3, -1, 6, 5, NA, NA, NA, 5, 2, NA, 4, 2, NA, -8, -9, NA, 9, 6, NA, 8, 8, NA, 7, -3, -7, -4, -7, -9, -9, -2, -1, 6, 9, -5, 7, 7, -5, 0, 8, NA, -2, -6, -10, 9, 2, -3, -7, -6, -2, NA, NA, NA, 6, 8, -7, -5, -7, 8, 5, 5, -7, -2, -4, NA, 0, 1, 4, 6, -6, NA, 6, -7, NA, 9, -1, -5, 9, 10, -4, 5, NA, NA) ) Currently, I’m working on code to add diagnostics plots to ARM analysis. This is a prototype. We calculate the $$3^{rd}$$ and $$4^{th}$$ moments - skewness and kurtosis, and report those over a histrogram; we also compute a measure of normality and display that over a QQ-norm plot. Finally, we add a test for homogeneity of means, along with a box-whisker plot. The plotting function allows us to compare two sets of random variables. moments <- function(x) { grand.mean <- mean(x) grand.N <- length(x) grand.sd <- sqrt(sum((x- grand.mean)^2)/(grand.N-1)) skew <- (sum((x - grand.mean)^3))/(grand.sd^3) skew <- skew * grand.N/((grand.N-1)*(grand.N-2)) kurt <- ((sum((x - grand.mean)^4))/(grand.sd^4)) kurt <- kurt * (grand.N*(grand.N+1))/((grand.N-1)*(grand.N-2)*(grand.N-3)) kurt <- kurt - (3*(grand.N-1)*(grand.N-1))/((grand.N-2)*(grand.N-3)) return(list(Mean=grand.mean,N=grand.N,SD=grand.sd,Skewness=skew,Kurtosis=kurt)) } plot.comparisons <- function(vector1, vector2, group, main1="Data 1", main2="Data 2") { moments1 <- moments(vector1) moments2 <- moments(vector2) lbl1.skew <- paste('Skewness = ', format(moments1$Skewness,digits=3))
lbl2.skew <- paste('Skewness = ', format(moments2$Skewness,digits=3)) lbl1.kur <- paste('Kurtosis = ', format(moments1$Kurtosis,digits=3))
lbl2.kur <- paste('Kurtosis = ', format(moments2$Kurtosis,digits=3)) lbl1.sw <- paste('Shapiro-Wilks p = ', format(shapiro.test(vector1)$p.value,digits=3))
lbl1.ks <- ''
lbl2.sw <- paste('Shapiro-Wilks p = ', format(shapiro.test(vector2)$p.value,digits=3)) lbl2.ks <- '' #todo flag for KS test? lbl1.lev <- '' #todo - write levene's test lbl2.lev <- '' lbl1.bar <- paste('Bartlett p = ', format(bartlett.test(x=vector1,g=group)$p.value,digits=3))
lbl2.bar <- paste('Bartlett p = ', format(bartlett.test(x=vector2,g=group)$p.value,digits=3)) par(mfrow=c(3,2)) hist(vector1, main = main1, col='grey') mtext(paste(lbl1.skew, lbl1.kur)) hist(vector2, main = main2, col='grey') mtext(paste(lbl2.skew, lbl2.kur)) qqnorm(vector1) qqline(vector1, col = 2) mtext(paste(lbl1.sw, lbl1.ks)) qqnorm(vector2) qqline(vector2, col = 'grey') mtext(paste(lbl2.sw, lbl2.ks)) plot(x=group, y=vector1) mtext(lbl1.bar) plot(x=group, y=vector2) mtext(lbl2.bar) } # Bullseye ## Linear model The design of the experiment implies an analysis of variance, assuming errors are i.i.d. We can check these assumptions by diagnostic plots of the residuals. Historically, ARM has tested the assmuptions of AOV on the observed plot means, so we include these in the diagnostics. design.model <- lm(Bullseye ~ replicate+treatment, data = means.dat) plot.comparisons(means.dat$Bullseye[!is.na(means.dat$Bullseye)], resid(design.model), means.dat$treatment[!is.na(means.dat$Bullseye)], main1="Plot Means", main2="AOV Residuals") There is some kurtosis in the residuals, so we might want to try a better analysis model. Since we have missing plots, we might have better results with a mixed-model analysis. Specifically, we have confounding among treatment effects and replicate effects (when plots are missing, blocks are no longer complete). ## Mixed model library(lme4) ## Loading required package: Matrix mixed.model <- lmer(Bullseye ~ treatment + (1|replicate), data = means.dat) plot.comparisons(resid(design.model), resid(mixed.model), means.dat$treatment[!is.na(means.dat\$Bullseye)],
main1="AOV Residuals", main2="REML Residuals")