Randomization density
Using 100 repetitions allows you to understand the mechanism of permuting. However, 100 is not enough to observe the full range of likely values for the null differences in proportions.
Recall the four steps of inference. These are the same four steps that will be used in all inference exercises in this course and future statistical inference courses. Use the names of the functions to help you recall the analysis process.
specify
will specify the response and explanatory variables.hypothesize
will declare the null hypothesis.generate
will generate resamples, permutations, or simulations.calculate
will calculate summary statistics.
In this exercise, you'll repeat the process 1000 times to get a sense for the complete distribution of null differences in proportions.
This exercise is part of the course
Foundations of Inference in R
Exercise instructions
The dplyr
, ggplot2
, NHANES
, and infer
packages have been loaded for you.
- Generate 1000 differences in proportions by shuffling the
HomeOwn
variable using theinfer
syntax. Recall theinfer
syntax:specify
that the relationship of interest isHomeOwn
vs.Gender
and a success in this context is homeownership,success = "Own"
.hypothesize
that the null is true wherenull = "independence"
(meaning gender and homeownership are not related).generate
1000 permutations; setreps
to 1000.calculate
the statisticstat = "diff in props"
with the order ofc("male", "female")
.
- Run the density plot code to create a smoothed visual representation of the distribution of differences. What shape does the curve have?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform 1000 permutations
homeown_perm <- homes %>%
# Specify HomeOwn vs. Gender, with `"Own" as success
___(___ ~ ___, success = "___") %>%
# Use a null hypothesis of independence
___(___) %>%
# Generate 1000 repetitions (by permutation)
___(reps = ___, type = "permute") %>%
# Calculate the difference in proportions (male then female)
___(___, order = ___))
# Density plot of 1000 permuted differences in proportions
ggplot(homeown_perm, aes(x = stat)) +
geom_density()