Session Ready
Exercise

Calculating within group variance

Now that we've got our grand mean and the means of the different genres and our between group variance, we continue to calculate our within group variance. The formula for the within group variance is the following: $$\frac{\sum(y_{i1} - \bar{y}_1)^2 + \sum(y_{i2} - \bar{y}_2)^2 + ... + \sum(y_{ig} - \bar{y}_g)^2}{n - g}$$

Again this formula looks complicated so let's chop it up into parts. \(y_i1\) represents each observation in a group and $\bar{y}_i} represents the mean for that group. What we then do is that we subtract the group mean from each group observation, square this and then sum it. If we are done with the first group, we repeat this procedure for the second group and so on. The total sum of this procedure, which is called the within sum of squares, is then divided by the sample size (n) - the number of groups (g). The result is the within group variance.

In the current exercise our overall average is stored in the variable grand_mean while our group averages are stored in the variables classical_average, hiphop_average and pop_average. The dataframes classical_data, hiphop_data and pop_data contain our samples per genre.

Instructions
100 XP
  • The instruction code includes the code to calculate the sum of squares for the first group: the classical genre. You can use this code to do the same for the hiphop genre and the pop genre and store the results in the variables sum_squares_hiphop and sum_squares_pop.
  • Add the within sum of squares for every group together and divide this by the \(n - g\). Round the result to zero digits and store it in the variable within_group_variance.