When data fail to meet the assumptions of AOV - for example, when residuals are not normally distributed or treatment variances are not homogeneous, we may prefer to analysis the data using methods that do not depend on parametric assumptions. It is common in these cases to analyze the rank (ordered observations) instead of raw assessment values.
We will use winter wheat yield from the SDSU CPT trials, Brookings, SD, 2004. Frequently, yield dat is left-skewed (as shown in the histogram below), and common data transformations do not correct this type of skewness. Since assumptions of normality depend on symmetry in the distribution of the data, we might prefer a rank based analysis for this example.
hist(arm.dat$assessment1)
analysis.model <- lm(assessment1 ~ treatment, data = arm.dat)
For convenience, we can use the moments
library to test for skewness and kurtosis of the residuals.
hist(residuals(analysis.model))
library(moments)
agostino.test(residuals(analysis.model))
##
## D'Agostino skewness test
##
## data: residuals(analysis.model)
## skew = -0.72938, z = -3.14090, p-value = 0.001684
## alternative hypothesis: data have a skewness
anscombe.test(residuals(analysis.model))
##
## Anscombe-Glynn kurtosis test
##
## data: residuals(analysis.model)
## kurt = 3.6176, z = 1.4782, p-value = 0.1393
## alternative hypothesis: kurtosis is not equal to 3
As we see, the assumption of normality of the residuals is not supported for the linear model.
Analysis of ranks involves two steps. First, observations are assigned rank scores, usually from smallest to largest (the largest values given the largest ranks). Then, test statistics are calculated using rank scores. The most common are
Suppose we have a series of \(N\) observations \(y_{ij}\) that can be grouped into two or more groups (\(k=1 ... G\)). The Kruskal-Wallis analysis follows as
Assign a rank \(r_{ij} = \text{rank}(y_{ij})\) tp all assessments, without regard to group. Typically, the smallest value is given the smallest rank (1). If values are identical (tied), then assign each assessments the average of the ranks over those values.
Calculate the test statistic ((Conover 1980))1
\[ T =\frac{1}{S^2} \left(\sum_{i=1}^{k} \frac{R^2_{i}}{n_i} - \frac{N (N-1)^2}{4}\right) \]
where \(R_i = \sum_{j=1}^{n_i} r_{ij}\) and
\[ S^2 = \frac{1}{N-1} \left( \sum_{i} \sum_{k} r_{ij}^2 - N\frac{(N+1)^2}{4} \right) \]
The test statistic \(T\) can be compared to an exact \(T\), but it is more common to approximate \(T\) as \(\chi ^2\) with \(g-1\) d.f. When \(n_i\) are small (~5) then exact tests should be preferred, but for large \(N\) (~30) this becomes computationally intensive.
An alternative notation comes from (Sprent and Smeeton 2001)
\[ T = \frac{(N-1)(S_k-C)}{S_r - C} \] where \(S_r = \sum_{ij} r^2_{i,j}\). We let \(s_i = \sum_{j=1}^{n_i}r_{ij}\) and \(S_k = \sum_{i=1}^{k}\left(s_i^2 \ n_i \right)\). This notation suggests the similarity to uncorrected total and treatment sums of squares.
We can use treatment rank means (\(r_{i \cdot}\)) or treatment rank sums (\(R_{i \cdot}\)) to make comparisons among treatments. As with differences between means, we can write critical values as
\[ \left| r_{i \cdot} - r_{j \cdot} \right| = \left| \frac{R_{i \cdot}}{n_i} - \frac{R_{j \cdot}}{n_j} \right| > t_{\nu,1-\alpha/2} \text{s.e.}_{rank} \]
where \[ \text{s.e.}_{rank} = S \sqrt{ \frac{N-1-T}{N-k} }\sqrt{\frac{1}{n_i} + \frac{1}{n_j}} \]
When we use Kruskal-Wallis tests, then \(\nu = N-k\)
arm.dat$rank <- rank(arm.dat$assessment1,ties.method='average')
print(R <- tapply(arm.dat$rank, list(arm.dat$treatment), sum))
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 146 129 190 189 398 280 335 324 118 233 311 288 192 225 355 307 220 311
## 19 20 21 22 23 24 25 26 27 28 29 30
## 233 245 218 357 231 204 208 175 189 238 307 104
print(n <- tapply(arm.dat$rank, list(arm.dat$treatment), length))
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## 26 27 28 29 30
## 4 4 4 4 4
print(k <- length(R))
## [1] 30
print(N <- sum(n))
## [1] 120
print(S2 <- (1/(N-1))*(sum(arm.dat$rank^2) - (N*(N+1)^2)/4))
## [1] 1210
print(T <- (1/S2)*(sum(R^2/n)-(N*(N+1)^2)/4))
## [1] 33.72066
print(crit.chi <- qchisq(1 - alpha, df=k-1))
## [1] 42.55697
print(1-pchisq(T,df=k-1))
## [1] 0.2496326
Compare with the R function
kruskal.test(assessment1 ~ treatment, data=arm.dat)
##
## Kruskal-Wallis rank sum test
##
## data: assessment1 by treatment
## Kruskal-Wallis chi-squared = 33.721, df = 29, p-value = 0.2496
means <- R/n
sd.rank <- sqrt(S2*((N-1-T)/(N-k)))
post.tbl <- matrix(0,nrow=k,ncol=k)
crit.table <- matrix(0,nrow=k,ncol=k)
crit.t <- qt(1-alpha/2,N-k)
for(i in 1:(k-1)) {
for(j in (i+1):k) {
post.tbl[i,j] <- abs(means[i]-means[j])
crit.table [i,j] <- crit.t*sd.rank*(sqrt(1/n[i]+1/n[j]))
post.tbl[j,i] <- post.tbl[i,j]>crit.table[i,j]
crit.table [j,i] <- 2*(1-pt(post.tbl[i,j]/(sd.rank*(sqrt(1/n[i]+1/n[j]))),N-k))
}
}
post.tbl[1:5,1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 4.25 11.00 10.75 63.00
## [2,] 0 0.00 15.25 15.00 67.25
## [3,] 0 0.00 0.00 0.25 52.00
## [4,] 0 0.00 0.00 0.00 52.25
## [5,] 1 1.00 1.00 1.00 0.00
crit.table[1:5,1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.00000000 47.566924368 47.56692437 47.56692437 47.56692
## [2,] 0.85951063 0.000000000 47.56692437 47.56692437 47.56692
## [3,] 0.64703701 0.525787456 0.00000000 47.56692437 47.56692
## [4,] 0.65452302 0.532580604 0.99169217 0.00000000 47.56692
## [5,] 0.01000859 0.006099384 0.03249823 0.03169403 0.00000
To analyze ranks in ARM, use the AR
(auto rank) code.
These data reproduce Table 2.1 from (1952). Column two is a duplicate of column one, but column two contains the AR
action code, while one does not.
Detransform means
is selected, ARM will report assessment means, but mean comparisons will still be based on ranks, as shown here. See report sets ARRanks and ARMeansFriedman’s analysis assigns ranks within whole blocks, and is only applicable when blocks are balanced (all treatments appear in each block). When ties are present, we write ((Sprent and Smeeton 2001), p 211). If the are missing assessments, or when there higher-order blocking (i.e. Latin squares) is present, we use the Aligned Rank Test.
\[ S_r = \sum _{i,j} r^2_i \]
where \(r_{i j}\) is the rank assigned treatment \(i\) in block \(j\). We let
\[ S_k = \frac{\sum _{i=1} ^k s^2_i}{b} \]
where \(s_i = \sum_{j=1} ^k r_{ij}\) We also calculate a correction factor
\[ C = \frac{b t (t+1)^2}{4} \] We then calculate the test statistic
\[ T = \frac{b (t-1) (S_t - C)}{S_r - C} \]
#assume we've excluded all missing values
UniqueTreatments <- unique(arm.dat$treatment)
UniqueReplicates <- unique(arm.dat$replicate)
for(rep in UniqueReplicates) {
mask <- arm.dat$replicate == rep
arm.dat$rank[mask] <- rank(arm.dat$assessment1[mask], ties.method='average')
}
s <- tapply(arm.dat$rank, list(arm.dat$treatment), sum)
b <- length(UniqueReplicates)
k <- length(UniqueTreatments)
S_r <- sum(arm.dat$rank^2)
S_k <- sum(s^2)/b
C <- (b*k*(k+1)^2)/4
print(T <- (b*(k-1)*(S_k - C))/(S_r-C))
## [1] 52.72258
When \(b\) and \(k\) are not too small, then \(T\) can be approximated as \(\chi ^2\) with \(k-1\) d.f.
print(1-pchisq(T,df=k-1))
## [1] 0.004523279
print(qchisq(1 - alpha, df=k-1))
## [1] 42.55697
We can compare this to the built-in R function,
friedman.test(assessment1 ~ treatment | replicate, data = arm.dat)
##
## Friedman rank sum test
##
## data: assessment1 and treatment and replicate
## Friedman chi-squared = 52.723, df = 29, p-value = 0.004523
This function will fail if all observations are missing for any single treatment (but this can be corrected by reordering treatments as factors).
In the cases where there are no ties, \(S_r = b k (k+1)(2k+1)\) and \(T\) simplifies to
\[ T = \frac{12 \sum _{i=1} ^t s^2_i}{bt(t-1)} - 3b(t+1) \]
print(T <- (12*sum(s^2))/(b*k*(k+1)) - 3*b*(k+1))
## [1] 52.72258
For treatment comparisons, we calculate a minimum difference of rank sums by
\[ \left| R_{i \cdot} - R_{j \cdot} \right| > t_{\nu,1-\alpha/2} \text{s.e.}_{R} \]
where \[ \text{s.e.}_{R} = \sqrt{ \frac{2 b}{(b-1)(k-1)} }\sqrt{S_r - S_k} \]
err.df <- (k-1)
crit.t <-qt(1 - 0.05 / 2, err.df)
A <- sum(arm.dat$rank^2)
A2 <- (b*k*(k+1)*(2*k+1))/6
B <- sum(s^2)/b
T <- ((b-1)*(B-(b*k*(k+1)^2)/4))/(A-B)
print(sqrt((2*b*(A-B))/((b-1)*(k-1)))/b)
## [1] 5.308852
These data reproduce Table 1 from (1937). There are three columns in the data, all identical. The first is marked IID
, while the third is marked AR
. The second column is AS
transformed; this is suggested by the nature of the assessment (standard deviations of expenditures at different income levels)
Detransform means
is selected, ARM will report assessment means, but mean comparisons will still be based on ranks, as shown here. See report sets ARRanks and ARMeansLehmann and D’Abrera (1975) proposed’aligning’ observations by subtracting some estimate of block effect from each observation that block, then performing analysis of ranks on the aligned values. Consider the block-only model
\[ y_{ij} = \mu + b_{j} + \epsilon_{ij} \]
We calculate \(\epsilon'_{ij} = y_{ij} - (\mu +\widehat{b}_{j})\) and then rank
\[ r'_{ij} = \text{rank}(\epsilon'_{ij}) \]
See - ARM Trial
analysis.model <- lm(assessment1 ~ treatment, data = arm.dat)
block.lm <- lm(assessment1 ~ replicate, data = arm.dat)
arm.dat$AlignedRank <- rank(resid(block.lm))[row.names(arm.dat)]
#In most cases, we want rank to be sch that the smallest values have the lowest rank. This preserves the
#relative order of treatment means.
arm.dat$AlignedRank <- max(arm.dat$AlignedRank,na.rm=TRUE) - arm.dat$AlignedRank + 1
rank.test <- kruskal.test(AlignedRank ~ treatment, arm.dat)
rank.test
##
## Kruskal-Wallis rank sum test
##
## data: AlignedRank by treatment
## Kruskal-Wallis chi-squared = 53.852, df = 29, p-value = 0.003365
Because Latin squares are blocked in two dimensions, both Kruskal-Wallis and Friedman’s test can be expected to be much less powerful, compared to the appropriate parametric analysis. In this case, we prefer to use the aligned rank method (Kruskal-Wallis on the residuals of the block model).
Best and Rayner (2011) compared four methods of the analysis of ranks for a Latin square design. We reproduce these in part in:
Best, John, and John Rayner. 2011. “Nonparametric Tests for Latin Sqares.” Working Paper 11-11. Centre for Statistical; Survey Methodology, University of Wollongong. http://ro.uow.edu.au/cssmwp/83.
Conover, W.J. 1980. Practical Nonparametric Statistics. Second. John Wiley & Sons. https://books.google.com/books?id=NV4YAAAAIAAJ.
Friedman, Milton. 1937. “The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance.” Journal of the American Statistical Association 32 (200): 675–701.
Higgins, James J, R Clifford Blair, and Suleiman Tashtoush. 1990. “The Aligned Rank Transform Procedure.” Conferences on Applied Statistics in Agriculture.
Kruskal, William H. 1952. “Use of Ranks in One-Criterion Variance Analysis.” Journal of the American Statistical Association 47 (260): 583–621.
Lehmann, Erich Leo, and H.J.M. D’Abrera. 1975. Nonparametrics: Statistical Methods Based on Ranks. Prentice Hall. https://books.google.com/books?id=zNNFAQAAIAAJ.
Sprent, P, and N C Smeeton. 2001. Applied Nonparametric Statistical Methods. Third. Chapman & Hall/CRC.
This is a simplification of the formula from (Kruskal 1952), used when ties are present↩