Get startedGet started for free

Working with many tidy models

1. Working with many tidy models

In the last exercises

2. We have a model for each country

you created a combined dataset, called country_coefficients, of the details of each per-country model, with rows for the slope and intercept for each country. Since the data is tidy, you can manipulate these coefficients with dplyr operations just like you did the original voting data. For example, in this analysis we're interested in how countries change over time (the slope) not where they started- the intercept. So

3. Filter for the year term (slope)

we can use dplyr's filter to get only the cases where term equals year- the ones describing how year affected percent_yes. Thus- filter for term == "year". Not all of these slopes can be trusted- some may be due to random noise. We may want to get only the models that were statistically significant. Recall that the p-value of a model is a common metric for whether it is due to noise- we often require that the p-value be less than point-05 to call a trend significant. Here we run into a common issue you may be familiar with- when we run many statistical tests and evaluate their p-values, we need to do a multiple hypothesis correction. This is a complicated problem that is outside the scope of this course, but the basic issue is that if you try many tests, some p-values will be less than point-05 by chance, meaning we need to be more restrictive. R provides a useful built-in function for p-value correction, called p-dot-adjust.

4. Filtered by adjusted p-value

By filtering for cases where the adjusted p-value is less than point-05, we can feel more safe in our assumptions, and get a set of country trends that we believe are real. Using dplyr operations to work with many model outputs is a powerful way to draw conclusions out of a large dataset. In your exercises you'll also use arrange to find the countries with the strongest upward and downward trends over time.

5. Let's practice!