Assumptions checking analysis of variance

As you may recall from the lectures, there are three important assumptions that need to be checked before doing an analysis of variance:

  1. The population distribution for the dependent variable for each of the g groups need to be approximately normal
  2. Those distributions for each of the groups have the same standard deviation (homogeneity of variances)
  3. The data resulted from randomization

In this exercise we are going to check the for the first two assumptions. To check for normality in each of our different genres, we are going to use the R function shapiro.test(). You can give this function a vector of numeric values from a group. For instance, we can provide it our duration variable, like so: classical_data$duration. To check whether each group approximately has the same standard deviation, we are going to use the R function bartlett.test(). This function has a formula interface. This interface works similar to the t.test() interface. You can use the bartlett test like so: bartlett.test(y_variable ~ x_variable). In our example, the y variable is duration and the x variable is genre. Both test will test for the null hypothesis that the data is normally distributed and that the variances in each of the groups are the same. If the p value is significant, that is, smaller than 0.05, then these assumptions are not met.

This exercise is part of the course

Inferential Statistics

View Course

Exercise instructions

  • Look at the example code in your script. This code checks whether the data points of the classical genre are approximately normally distributed. Do the same test for the hip hop genre data. Note that this data is availabe in the hiphop_data dataframe
  • Check for homogeneity of variance using the bartlett test. The combined data is available in the song_data data frame. Remember that you can extract variables from a dataframe using the dollar notation. Like so song_data$duration.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# check for normality in the classical genre
shapiro.test(classical_data$duration)

# check for normality in the hip hop genre


# check for homogeneity of variances using the bartlett test