Skip to content

Two Way ANOVA

Two Way ANOVA

  • Extension of One-way ANOVA, a special case of a GLM.
  • Two factors, each with \(\geq 2\) levels.
  • Uses a maximum of \((G_1-1) + (G_2 - 1) +(G_1-1)(G_2 - 1)\) (individual + interactions)indicator variables

Two types of factors

  • FIXED effect: data has been gathered from all the levels of the factor that are of interest
  • Random effect: interest is in all possible levels of factor, but only a random sample of levels is included in the data.

Case Study The Pygmalion Effect

library(Sleuth2)
data = case1302
head(data)
score = data$Score
company = as.factor(data$Company)
treat = as.factor(data$Treat)
A data.frame: 6 × 3
CompanyTreatScore
<fct><fct><dbl>
1C1Pygmalion80.0
2C1Control 63.2
3C1Control 69.2
4C2Pygmalion83.9
5C2Control 63.1
6C2Control 81.5
library(ggplot2)
ggplot(data, aes(x=company, y=score, fill=company)) + geom_boxplot()

png

ggplot(data, aes(x=treat, y=score, fill=treat))+geom_boxplot()

png

interaction.plot(company, treat, score, col=c('blue','red'))

png

interaction.plot(treat, company, score, col=c('blue','red','green','purple','black'))

png

  • Response: score on a test
  • Factors:
    1. Company: 10 levels (C1,...,C10)
    2. Treatment: 2 levels (Pygmalion, Control)
  • Aim: investigate the interaction between Company and Treatment
  • Method: Fit a Two-Way ANOVA

Variables

  • \(Y_i\) score for \(i\)th platoon
  • Explanatory: \(9(\mathbb{I}_{C_m, i})+1(\mathbb{I}_{P_n,i})+9(Interaction)\)
    For each indicter: \(\mathbb{I_{C_m,i}}=I\)(ith platoon is from Company m), \(\mathbb{I_{P_n,i}}=I\)(ith platoon is 'Pygmalion')

Overall test and Partial Test

Overall test \(H_0: \beta = \vec{0}\)
Partial test $H_0: $ subset of \(\beta\) are \(0\)
Test statistic:

\[\begin{align*} F &= \frac{(SSReg_{\text{full}} - SSReg_{\text{reduced}}) / d.f._{\text{reduced}}} {MSE_{\text{full}}}\\ &= \frac{(RSS_{\text{full}} - RSS_{\text{reduced}}) / d.f._{\text{reduced}}} {MSE_{\text{full}}} \\ &\sim F_{d.f._{\text{full}}, d.f._{\text{error}}} \end{align*}\]

We can use partial test to see if the interaction terms are needed

Interactive Model

model_i = lm(score~company*treat)
summary(model_i)
Call:
lm(formula = score ~ company * treat)

Residuals:
   Min     1Q Median     3Q    Max 
  -9.2   -2.3    0.0    2.3    9.2

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)               80.000      7.204  11.105 1.49e-06 ***
companyC2                  3.900     10.188   0.383    0.711    
companyC3                -11.800     10.188  -1.158    0.277    
companyC4                 -3.500     10.188  -0.344    0.739    
companyC5                  7.800     10.188   0.766    0.463    
companyC6                  9.800     10.188   0.962    0.361    
companyC7                 -3.900     10.188  -0.383    0.711    
companyC8                 -8.500     10.188  -0.834    0.426    
companyC9                -10.500     10.188  -1.031    0.330    
companyC10                 3.700     10.188   0.363    0.725    
treatControl             -13.800      8.823  -1.564    0.152    
companyC2:treatControl     2.200     12.477   0.176    0.864    
companyC3:treatControl    21.800     13.477   1.618    0.140    
companyC4:treatControl     3.800     12.477   0.305    0.768    
companyC5:treatControl     2.200     12.477   0.176    0.864    
companyC6:treatControl     5.800     12.477   0.465    0.653    
companyC7:treatControl     2.800     12.477   0.224    0.827    
companyC8:treatControl    12.800     12.477   1.026    0.332    
companyC9:treatControl    17.400     12.477   1.395    0.197    
companyC10:treatControl    0.800     12.477   0.064    0.950    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.204 on 9 degrees of freedom
Multiple R-squared:  0.7388,    Adjusted R-squared:  0.1875 
F-statistic:  1.34 on 19 and 9 DF,  p-value: 0.3358
anova(model_i)
A anova: 4 × 5
DfSum SqMean SqF valuePr(>F)
<int><dbl><dbl><dbl><dbl>
company9670.9755 74.552841.43665580.29901878
treat1338.8828338.882796.53037390.03091657
company:treat9311.4640 34.607110.66688950.72211538
Residuals9467.0399 51.89332 NA NA
plot(model_i, which=1:2)
Warning message:
"not plotting observations with leverage one:
  1, 4, 7, 8, 9, 12, 15, 18, 21, 24, 27"

png

png

Additive Model

model_a = lm(score~company+treat)
summary(model_a)
Call:
lm(formula = score ~ company + treat)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.660  -4.147   1.853   3.853   7.740

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  75.61367    4.16822  18.141 5.16e-13 ***
companyC2     5.36667    5.36968   0.999   0.3308    
companyC3     0.19658    6.01886   0.033   0.9743    
companyC4    -0.96667    5.36968  -0.180   0.8591    
companyC5     9.26667    5.36968   1.726   0.1015    
companyC6    13.66667    5.36968   2.545   0.0203 *  
companyC7    -2.03333    5.36968  -0.379   0.7094    
companyC8     0.03333    5.36968   0.006   0.9951    
companyC9     1.10000    5.36968   0.205   0.8400    
companyC10    4.23333    5.36968   0.788   0.4407    
treatControl -7.22051    2.57951  -2.799   0.0119 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.576 on 18 degrees of freedom
Multiple R-squared:  0.5647,    Adjusted R-squared:  0.3228 
F-statistic: 2.335 on 10 and 18 DF,  p-value: 0.0564

Partial F test (ANOVA)

Test \(H_0:\beta_1 = 0\) (treat Control), \(H_a:\beta_1\neq 1\)
(T-test or partial F test) \(F\sim F_{1, 18}\)
\(p=0.0119\)

Test \(H_0: \beta_2=...=\beta_{10} = 0\), $H_a: $ at least one of \(\beta_2,...,\beta_{10} \neq 0\)
\(F\sim F_{9,18}\)

anova(model_a)
A anova: 3 × 5
DfSum SqMean SqF valuePr(>F)
<int><dbl><dbl><dbl><dbl>
company 9670.9755 74.552841.7237560.15556403
treat 1338.8828338.882797.8354010.01185757
Residuals18778.5039 43.25022 NA NA

Confidence Interval for two sample t test

\[(\bar{x}_1-\bar{x_2}\pm t\cdot s_p \sqrt{n_1^{-1}+n_2^{-1}} )\]

Least square CI

\[\hat{\beta}_1 \pm t\cdot se(\hat\beta_1)\]

Model Checking

  • Model is appropriate (include relevant or exclude irrelevant factors)
  • No outliers
  • Normality
par(mfrow=c(2,2))
plot(model_a)

png

  • Variance assumption may not be satisfied (variance is decreasing), consider weighted least square regression
  • Normality satisfied, no dramatic pattern
  • No outliers