Multi Comparisons and One Way ANOVA
Multiple Comparisons
- post hoc procedure: further comparisons after significant result from overall One-way ANOVA
- If the result for One-way ANOVA is good enough, i.e.some pairs are evidently true, we may omit some pairs to remove the number of tests
- Max of pairwise comparisons
- Major issue: increased chance of making at least one Type I error when carrying out many tests,
- Two common solutions: based on controlling family Type I error rate, choose
- Bonferroni
- Tukey's
Example ?
- For n independent tests:
Bonferroni's Method
- Based on Bonferroni's inequality , hence the event that th test results in a Type I error.
- Method: conduct each of pairwise tests at level
- CI:
- Conservative: overall Type I error rate is usually much less than if tests are not mutually independent.
- Type II error inflation.
Tukey's Approach
- Usually less conservative than Bonferroni, particularly if group sample size are similar. Controls the overall Type I error rate of , simultaneous CI converage rate is .
Studentized Range distribution
- . Determine the distribution of the max and min of .
- , Range
- Based on observations from , the Studentized range statistic is sample std.
-
Based on group means, with observations per group:
estimator of the pooled std. d.f.
- If there are groups, then there is a max of pairwise differences
- Controlling overall simultaneous Type I error rate v.s. Individual Type I error rate
- Find a pairwise significant difference
- Compare method-wise significant difference, with OR
- determine whether CI contains 0 OR
- Compare with
- with d.f.
Tukey's Honestly Significant Difference (HSD)
- let the critical value from the Studentized Range distribution.
- Family rate
- Tukey's HSD
Case Study
Bonferroni
Pairwise comparisons using t tests with pooled SD
data: percent and judge
Spock's A B C D E
A 0.00022 - - - - -
B 0.00013 1.00000 - - - -
C 0.00150 1.00000 1.00000 - - -
D 0.57777 1.00000 1.00000 1.00000 - -
E 0.03408 1.00000 1.00000 1.00000 1.00000 -
F 0.01254 1.00000 1.00000 1.00000 1.00000 1.00000
P value adjustment method: bonferroni
0.357 % | 99.643 % | |
---|---|---|
(Intercept) | 8.078085 | 21.16636 |
judgeA | 8.547341 | 30.44821 |
judgeB | 8.647254 | 29.34163 |
judgeC | 5.222969 | 23.73259 |
judgeD | -2.969585 | 27.72514 |
judgeE | 1.997255 | 22.69163 |
judgeF | 2.922970 | 21.43259 |
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = percent ~ judge)
$judge
diff lwr upr p adj
A-Spock's 19.49777727 7.514685 31.480870 0.0001992
B-Spock's 18.99444405 7.671486 30.317402 0.0001224
C-Spock's 14.47777732 4.350216 24.605339 0.0012936
D-Spock's 12.37777758 -4.416883 29.172438 0.2744263
E-Spock's 12.34444443 1.021486 23.667402 0.0248789
F-Spock's 12.17777766 2.050216 22.305339 0.0098340
B-A -0.50333322 -13.512422 12.505755 0.9999997
C-A -5.01999995 -17.003092 6.963092 0.8470098
D-A -7.11999969 -25.094638 10.854639 0.8777485
E-A -7.15333284 -20.162421 5.855755 0.6146239
F-A -7.31999961 -19.303092 4.663093 0.4936380
C-B -4.51666673 -15.839625 6.806291 0.8742030
D-B -6.61666648 -24.158118 10.924785 0.9003280
E-B -6.64999962 -19.053679 5.753679 0.6418003
F-B -6.81666639 -18.139624 4.506292 0.5109582
D-C -2.09999975 -18.894661 14.694661 0.9996956
E-C -2.13333289 -13.456291 9.189625 0.9968973
F-C -2.29999966 -12.427561 7.827562 0.9914731
E-D -0.03333314 -17.574784 17.508118 1.0000000
F-D -0.19999992 -16.994661 16.594661 1.0000000
F-E -0.16666677 -11.489625 11.156291 1.0000000

Linear Regression Model
- , response continuous, explanatory categorical and/or continuous
- is linear in the 's i.e. no predictor is a linear function or combination of other predictors
- Least square Estimate, need columns of must be linear independent
- Null hypothesis . Assumptions
- Appropriate Linear Model
- Uncorrelated Errors
- .
-
Sum of Squares Decomposition
One-Way ANOVA
- Need one factor (categorical variable) with at least 2 levels
-
Aim: Compare group means
-
Predictors are indicator variables that classify the observations one way (into groups)
- special case of a general linear model
- equivalent to GLM with one-way classification (one factor)
- GLM uses dummy variables
- ANOVA: compare means by analyzing variability
One Way Expectations and Estimates
Then, the null hypothesis is the equal mean of th group and the compared group
In this case, for , mean of observations for group from which the th observation belongs, is the grand mean
judge
Spock's A B C D E F
9 5 6 9 2 6 9
Spock's 14.6222224235535
A 34.1199996948242
B 33.6166664759318
C 29.0999997456868
D 27
E 26.9666668574015
F 26.800000084771
Compare 6 judges (exclude Spock)

Df Sum Sq Mean Sq F value Pr(>F)
others$Judge 5 326.5 65.29 1.218 0.324
Residuals 31 1661.3 53.59
Rule of thumb
Not a formal way, not a quick check whether equal variance
Spock's <NA>
A 11.9418181005452
B 6.58222301297842
C 4.59292918339244
D 3.81837769736668
E 9.01014223675898
F 5.96887758208529
11.9418181005452
3.81837769736668
TRUE
Bartlett test of homogeneity of variances
data: others$Percent and others$Judge
Bartlett's K-squared = 6.3125, df = 5, p-value = 0.277
The small p-value may due to the uneven and small group sizes
Compare all 7 judges

Df Sum Sq Mean Sq F value Pr(>F)
judge 6 1927 321.2 6.718 6.1e-05 ***
Residuals 39 1864 47.8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = percent ~ judge)
Residuals:
Min 1Q Median 3Q Max
-17.320 -4.367 -0.250 3.319 14.780
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.622 2.305 6.344 1.72e-07 ***
judgeA 19.498 3.857 5.056 1.05e-05 ***
judgeB 18.994 3.644 5.212 6.39e-06 ***
judgeC 14.478 3.259 4.442 7.15e-05 ***
judgeD 12.378 5.405 2.290 0.027513 *
judgeE 12.344 3.644 3.388 0.001623 **
judgeF 12.178 3.259 3.736 0.000597 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.914 on 39 degrees of freedom
Multiple R-squared: 0.5083, Adjusted R-squared: 0.4326
F-statistic: 6.718 on 6 and 39 DF, p-value: 6.096e-05

Spock's 5.03879411343757
A 11.9418181005452
B 6.58222301297842
C 4.59292918339244
D 3.81837769736668
E 9.01014223675898
F 5.96887758208529
TRUE
Bartlett test of homogeneity of variances
data: percent by judge
Bartlett's K-squared = 7.7582, df = 6, p-value = 0.2564
- Residual vs Fitted: No obvious pattern, assume equal variance (also by rule of thumb and Bartlett)
- Normal Q-Q: overall OK
- Outliers: see Residual vs leverage, not influential point.
