Randomization density
Using 100 repetitions allows you to understand the mechanism of permuting. However, 100 is not enough to observe the full range of likely values for the null differences in proportions.
Recall the four steps of inference. These are the same four steps that will be used in all inference exercises in this course and future statistical inference courses. Use the names of the functions to help you recall the analysis process.
specifywill specify the response and explanatory variables.hypothesizewill declare the null hypothesis.generatewill generate resamples, permutations, or simulations.calculatewill calculate summary statistics.
In this exercise, you'll repeat the process 1000 times to get a sense for the complete distribution of null differences in proportions.
This exercise is part of the course
Foundations of Inference in R
Exercise instructions
The dplyr, ggplot2, NHANES, and infer packages have been loaded for you.
- Generate 1000 differences in proportions by shuffling the
HomeOwnvariable using theinfersyntax. Recall theinfersyntax:specifythat the relationship of interest isHomeOwnvs.Genderand a success in this context is homeownership,success = "Own".hypothesizethat the null is true wherenull = "independence"(meaning gender and homeownership are not related).generate1000 permutations; setrepsto 1000.calculatethe statisticstat = "diff in props"with the order ofc("male", "female").
- Run the density plot code to create a smoothed visual representation of the distribution of differences. What shape does the curve have?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform 1000 permutations
homeown_perm <- homes %>%
# Specify HomeOwn vs. Gender, with `"Own" as success
___(___ ~ ___, success = "___") %>%
# Use a null hypothesis of independence
___(___) %>%
# Generate 1000 repetitions (by permutation)
___(reps = ___, type = "permute") %>%
# Calculate the difference in proportions (male then female)
___(___, order = ___))
# Density plot of 1000 permuted differences in proportions
ggplot(homeown_perm, aes(x = stat)) +
geom_density()