Calculating the F statistic
Now that we have calculated the between-groups and the within-groups variability, we can calculate its ratio. The ratio of between-groups and within-groups variability produces a F statistic. See the following formula: $$F = \frac{Between-groups-variability}{within-groups-variability}$$
An F statistic will become larger if the between-groups variability rises and the within-groups variability stays the same. The F statistic will become smaller if the within-groups variability becomes larger and the between-groups variability stays the same. The F statistic has a F sampling distribution. This distribution is approximately centered around F = 1 when the null hypothesis is true. The larger the F statistic, the stronger the evidence against the null hypothesis.
The F distribution has two different degrees of freedom: df1 and df2. The formula for df1 is the following: \(df1 = g - 1\) where g is the amount of groups. The formula for df2 is the following: \(df2 = N - g\) where N is the sample size of all groups combined and g is the number of groups. These degrees of freedom come in handy when we want to calculate a p value for our obtained F statistic. To calculate a p value for our F statistic, we can use the pf()
function. This function works similarly as the pnorm()
and pbinom()
functions that you may have come across in the course on basic statistics. The pf()
function takes our F statistic as its first argument, our df1 as its second argument and our df2 as its third argument.
This exercise is part of the course
Inferential Statistics
Exercise instructions
- The variables
between_group_variance
andwithin_group_variance
are available in your console. Use these variables to calculate the F statistic and store the result in a variable calledf_stat
. Round the result to two digits. - Calculate the degrees of freedom df1 and df2 and store them in the variables
df1
anddf2
- Using the
pf()
function, calculate the p value and store this in the variablep_value
. Round the result to two digits. Make sure to calculate the p value associated with the upper tail.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# calculate the F statistic and store it in a variable called f_stat
# calculate the degrees of freedom and store it in the variables df1 and df2
# calculate the associated p value and store it in a variable called p_value