The Logic of ANOVA Alp Eren AKYUZ∗ April 2011
1
The Purpose of ANOVA
Let us start with the problem at hand. Assume we want to test the following hypothesis: H0 : µ1 = µ2 = µ3 = . . . The easiest way is to test the alternative. HA : At least one mean differs. The intuition is simple: if the difference between the sample means is large enough, than we can conclude the populations they belong cannot have equal means. How do we know if the difference is large enough? One way is to check the variation between sample means (Sum of Squares Among Groups). SSA =
c X
¯ 2 ¯j − X nj X j
j=1
Notice that as the difference between sample means get larger, the variation increases. However to assess whether this variation is large enough, we should also compare it to another measure of variation. One candidate for this comparison is a combination of sample variations (Sum of Squares Within Groups). ∗
Department of Management, Bogazici University,
[email protected]
1
The Logic of ANOVA
SSW =
nj c X X
¯ ij Xij − X
2
j=1 i=1
Before comparing these two values, we should make an adjustment to for the number of observations used in calculation of each. Using more observations always drives the variation up so in order to filter this effect we divide each variation by its degrees of freedom (number of observations used in calculation minus number of observations assumed to stay constant). c P
M SA =
¯ 2 ¯j − X nj X j
j=1
c−1 nj c P P
M SW =
¯ ij Xij − X
2
j=1 i=1
n−c Notice that MSA is calculated exactly as the sample variance. The formula for MSW is a modified and weighted version of that. Dividing the variation of sample means by the weighted variation of observations provides a plausible test statistic. FST AT =
2
M SA M SW
A Numerical Example
To illustrate these ideas in a more solid way, assume we have the information given in Table 1. From the table we can say the first two populations have approximately equal means. The same cannot be said for other pairs. Let us start by calculating the SSA and SSW.
¯= X
nj 3 X X Xji j=1 i=1
10
=
X11 + X12 + X21 + X22 + . . . + X24 + X32 + . . . + X34 10 2
The Logic of ANOVA
Table 1: Example dataset obs Sample A Sample B Sample C 1 2 1 2 2 4 3 6 3 5 8 4 7 12 5 ¯ X 3 4 7
¯ = (2 + 4) + (1 + 3 + 5 + 7) + (2 + 6 + 8 + 12) = 5 X 10 SSA =
c X
¯ 2 = 2 (3) + 4 (4) + 4 (7) = 50 ¯j − X nj X j
j=1
SSW =
nj c X X
¯ ij Xij − X
2
j=1 i=1
SSW = (2 − 3)2 + (4 − 3)2 + (1 − 4)2 + (3 − 4)2 + (5 − 4)2 +(7 − 4)2 +(2 − 7)2 + (6 − 7)2 + (8 − 7)2 + (12 − 7)2 = 74 Calculate the MSA and MSW. M SA =
SSA 50 = = 25 c−1 3−1
SSW 74 74 = = n−c 10 − 3 7 For the final step, calculate the FST AT . M SW =
M SA 25 = = 2.3648 M SW 74 Compare with critical value obtained from the F table for dof1 =2, dof2 =7. FST AT =
FCRIT = 4.7374 > 2.3648 = FST AT 3
The Logic of ANOVA Despite our initial judgement, the conclusion is ”Do NOT reject the null hypothesis. The population means are not significantly different.”.
- END -
4