## Analysis of Variance

Analysis of variance, often called ANOVA is a technique that is used to test multiple samples or multiple levels of variables within a sample.  It is very tempting to simply apply a t test multiple times to the data. In the case of multiple samples this is invalid.

In analysis of variance we are simply partitioning the variance into categories that influence the variation in the sample design.   Let us consider the following.

$$\hbox{Total } SS = \sum^{k}_{i=1} \sum^{n_i}_{j=1} X^2_{ij} - C$$

where C is the "correction term", N = sum of ni , and k is the number of groups, and ni = the number of observation in group i.

$$C = \frac{ \left( \sum^{k}_{i=1} \sum^{n_i}_{j=1} X_{ij} \right)^2}{N}$$

$$\hbox{Groups } SS = \sum^{k}_{i=1} \left( \frac{ \left( \sum^{n_i}_{j=1} X_{ij} \right)^2}{n_i} \right) -C$$

Table 1. ANOVA table calculation.
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F test
Groups Groups SS k - 1 Groups SS / Groups df MS Groups / MS Error
Error Total SS - Groups SS Total df - groups df Error SS / Error df
Total Total SS N-1

Using a  Spreadsheet like EXCEL we can generate a standard ANOVA table as below.  The summary  section just lists the sample statistics of  n, mean, and variance.  The second table list the SS or Sums of Squares for between the groups (sample 1 or 2),  within each group, and the total Sums of Squares.  Next is the column for degrees of freedom df , the means Square MS,  and the F test , which is the ration of the between group Sum of Squares and the within group Sums of Squares.  The next column is the p-value which is the probability that the F test is really less than the F crit value due to sampling error.

The larger the p-value the less sure you can be in the result.

Table 2. ANOVA table example.

This table indicates that the two samples are significantly different and we are very sure of this result (p = 0.000446).

Also See:

Chapter 10 - More on the Testing of Hypotheses pages 125-149 in:

Phillips, J. L. 2000. How to think about statistics. W. H. Freeman and Co. New York. 202 pp. ISBN 0-7167-3654-3

Chapter 10 - Single factor Analysis of Variance pages 180-191 in:

Zar, J. H. 1999. Biostatistical Analysis. Prentice-Hall, Inc. Englewood Cliffs, New Jersey. 718 pp.