1. Intervals for differences
If you haven't guessed already, I created the claim that 3/4 of Americans believe in life after death primarily to demonstrate how you can formulate a hypothesis test on a single proportion.
2. A question in two variables
There are plenty of more pressing questions that just beg to be asked of this data, but they require that we look at more than just a single variable. I'm curious to know: do men and women believe at different rates?
Let's let p be the true proportion that believe in life after death.
We can then phrase this question as the null hypothesis that the difference in the proportion of men that believe and the proportion of women that believe is zero.
The alternative hypothesis would then be that that difference is non-zero.
3. Do women and men have different opinions on life after death?
Let's take a look at how these proportions compare in the gss2016 dataset. The data live in two columns, postlife and sex, so we can map sex to the x-axis and their belief to the color fill of the bars. If we add a geom_bar layer, we get a stacked bar chart that shows us that we have more females in our dataset than males and that opinions are split.
4. Do women and men have different opinions on life after death?
We can convert these to proportions by adding the position equals "fill" argument. It looks like the proportion for men is a bit lower than the proportion for women.
5. Do women and men have different opinions on life after death?
We can calculate the difference in these proportions by using our normal summarize method of calculating a proportion, but add in a group_by line to indicate we want to calculate that proportion for men and women separately. The result is a vector of two proportions. We take their difference with the diff function and save it as d-hats, which we learn is 0-point-1-4-7.
6. Generating data from H0
Now to answer the question of whether this statistic is consistent with the null hypothesis that the difference in proportions is zero, we need some way to generate datasets as if this were true.
It's helpful to realize that we can restate in null hypothesis as "There is no association between belief in the afterlife and the sex of a subject."
We could also phrase it as "The variable postlife is independent from the variable sex." Well, if there truly is no association between our two variables, we should be able generate more data simply by shuffling or permuting the data.
7. Do women and men have different opinions on life after death?
To see how this generation works, let's start up our inference chain just like before, with specify feeding into hypothesize feeding into generate. We're looking to explain postlife based on the subjects sex, so we'll add that as an explanatory variable. The type of null that we're specifying is one of independence which allows us to generate data via permutation.
This code will work, but before I run it I'll make one change
8. Do women and men have different opinions on life after death?
and that is to specify the variables via the formula notation like this. It can be read as, "I'd like to explain postlife as a function of sex".
When we run this code, we see it generates a dataset with these two columns but where the postlife column has been shuffled or permuted. Let's keep track: the six rows show YES and the seventh shows NO.
9. Do women and men have different opinions on life after death?
So if we run this code again we'll get a different permutation.
One with the second and third being NO and the rest being YES. Both are examples of datasets similar to the one that we observed, but generated as if from a world where there was no association between these variables.
Let's use this method to build up a full null distribution.
10. Do women and men have different opinions on life after death?
The inference chain should look familiar but now we're going to generate 500 datasets and calculate a difference in proportions for each one. We need to specify the order of that difference, so we'll look at proportion of female minus proportion of male so that it's in the same order as the d-hat that we calculated from our data.
You'll see a warning message pop up here. That's just to let us know that we have some missing data that have been excluded from our analysis.
11. Do women and men have different opinions on life after death?
When we plot our observed statistic against the null distribution, we learn that that observed difference of around point-1-5 is quite a bit greater than the sort of differences that we'd see if there was no association. This corresponds to a p-value of point-0-2. These data constitute evidence that there is a more general association between sex and belief in the afterlife in the population of all Americans.
12. Let's practice!
OK, now it's your turn to practice.