Continuing the infer pipeline

1. Continuing the infer pipeline

Previously you saw part of the infer pipeline for simulation-based hypothesis tests.

2. Recap: hypotheses and dataset

We were testing the hypothesis that the proportion of hobbyists was the same for two age categories. However, this subset of the Stack Overflow survey was imbalanced, violating the assumptions of the traditional proportion test.

3. Recap: workflow

You saw the complete workflow for a custom hypothesis test. So far, we'd done the specify and hypothesize steps, resulting in a tibble with two columns and some extra attributes describing the response and explanatory variables, and the type of null hypothesis.

4. Motivating generate()

If we assume the null hypothesis is true, then it shouldn't matter which hobbyist value is matched up to each age category, since the proportion of hobbyists is the same in each age category. We can generate a simulated dataset by shuffling the response hobbyist values while keeping the explanatory age category values the same.

5. One permutation

Conceptually, this is how generate shuffles the dataset. It selects the response column, randomly samples the whole column without replacement, then binds it back to the explanatory column. Compare the original dataset on the left, and the shuffled dataset on the right. The hobbyist values have changed but the age category values have not.

6. Generating many replicates

Since randomness is used in the shuffling step, we can't just create one simulated dataset, or the results would be unreliable. generate performs the simulation step many times. Each simulated dataset is called a replicate and represents an example of what we might expect the two columns to look like in a universe where the null hypothesis is true.

7. generate()

To call generate, tell it how many replicates you want. For independence tests, the generation type should always be "permute". For convenience, generate combines all the simulated datasets into a single tibble. This is big! It has as many rows as the original dataset, times the number of replicates.

8. Calculating the test statistic

For each replicate, we calculate the test statistic, in this case the difference in proportions of hobbyists between the two age categories. Five thousand replicates gives five thousand differences in proportions. We have a distribution of test statistics. This is known as the null distribution.

9. calculate()

To use the difference in proportions as the test statistic, set the stat argument to "diff in props". We need to tell calculate which proportion to subtract from which by setting the order.

10. Visualizing the null distribution

To visualize this null distribution as a histogram, call visualize. This histogram doesn't have a normal bell curve. That means the assumptions for the proportion test didn't hold. Also notice that the test statistic only takes nine distinct values.

11. Calculating the test statistic on the original dataset

To calculate a p-value, we need to compare the null distribution to the test statistic from the original dataset.

12. Observed statistic: specify() %>% calculate()

To calculate the test statistic on the original sample dataset, you reuse the specify and calculate steps. Copy and paste the null distribution code, then remove the hypothesize and generate steps. Here, the observed statistic is point-one-six.

13. Visualizing the null distribution vs the observed stat

Here's the null distribution histogram with a vertical line added at the observed statistic. The observed statistic is at one edge of the distribution. Does this make it different enough from the null distribution that we should reject the null hypothesis? We'll need to calculate the p-value to find out.

14. Get the p-value

To get the p-value, call get_p_value, passing the null distribution and observed statistic. You also need to set the type of alternative hypothesis. The argument is called direction rather than alternative, and "two sided" is separated with a space rather than a dot. This time the p-value is point-one-five, which is greater than the significance level of point-one. That means that we should fail to reject the null hypothesis sticking with there being no difference in proportions of hobbyists between age categories. Recall the results from the dubious proportion test we used before. That had the opposite conclusion. Hopefully this illustrates the danger of using traditional hypothesis tests when the assumptions aren't met. Although it takes more code, the simulation-based hypothesis tests are more robust against small samples, and will help prevent you reaching poor conclusions.

15. Let's practice!

Let's run some simulated tests.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.