Calculating the p-values
In the video, you learned that a p-value measures the degree of disagreement between the data and the null hypothesis. Here, you will calculate the p-value for the original discrimination dataset as well as the small and big versions, disc_small
and disc_big
.
The original differences in proportions are available in your workspace, diff_orig
, diff_orig_small
, and diff_orig_big
, as are the permuted datasets, disc_perm
, disc_perm_small
, and disc_perm_big
.
Recall that you're only interested in the one-sided hypothesis test here. That is, you're trying to answer the question, "Are men more likely to be promoted than women?"
This exercise is part of the course
Foundations of Inference in R
Exercise instructions
visualize()
andget_p_value()
using the built ininfer
functions. Remember that the null statistics are below the original difference, so the p-value (which represents how often a null value is more extreme) is calculated by counting the number of null values which aregreater
than the original difference.- Repeat for the small dataset,
disc_perm_small
, which has observed differencediff_orig_small
. - Repeat for the big dataset,
disc_perm_big
, which has observed differencediff_orig_big
. - You can test your knowledge by trying out:
direction = "greater"
,direction = "two_sided"
, anddirection = "less"
before submitting your answer.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Visualize and calculate the p-value for the original dataset
disc_perm %>%
___(obs_stat = ___, direction = "___")
disc_perm %>%
___(___, ___)
# Visualize and calculate the p-value for the small dataset
___ %>%
___(___, ___)
___ %>%
___(___, ___)
# Visualize and calculate the p-value for the big dataset
___ %>%
___(___, ___)
___ %>%
___(___, ___)