Get startedGet started for free

Fitting multiple models

1. Fitting multiple models

In the last exercises you nested a data frame

2. nest() turns data into one row per country

to create many smaller data frames, one for each country. Recall, for example, that the first item in the data column was a table of Afghanistan's per-year data. Now you want fit a model on each of these one-country datasets- fitting one linear model for Afghanistan's data, one for Argentina, and so on. To fit a model for each item in a list column, you'll use the purrr package, which offers tools for working with functions and lists. In particular, you'll use the map function.

3. map() applies an operation to each item in a list

map lets you apply an operation to each item in a list. For example, if you had a list v with values 1, 2, and 3, you could use map and the expression "tilde dot times 10". The tilde and dot combination is a way of defining an operation, where the dot represents each item in the list- first 1, then 2, then 3. Thus the expression means "multiply each item by 10"- turning 1, 2, 3 into 10, 20, 30. Map is therefore useful any time you want to do something to each item of a list.

4. map() fits a model to each dataset

Here we want to fit a linear model into a new column based on each sub-data frame. We use mutate to define the new column "model", and use map to apply a linear regression to each item of “data". We describe the linear regression with tilde then our linear regression, the same kind we'd run on one data field, with dot as the data. This creates a new column of linear models - one for each sub-data frame. So the first item would contain the slope just for Afghanistan. It's nice that we've fit these models, but we can't combine them, manipulate them, or visualize them. That's why we return to the broom package,

5. tidy turns each model into a data frame

which takes each model and turns it into a tidy data frame of coefficients. We use map one more time to create another list column, calling this one "tidied". So now for each country, we have three columns: one with the original data, one with a linear model, and one with the tidied model. Tidied versions of statistical models are easy to combine, so

6. unnest() combines the tidied models

just like in the last lesson we can use unnest to bring them all into the top level. Now we have a table of coefficients, where the first two rows represent the slope and intercept for Afghanistan, the next two rows for Argentina, the next two for Australia, and so on: all of the details of each model in one place. This was four steps: nest by country, map to fit a model to each dataset, map to tidy each model, unnest to a table of coefficients. It's a complicated process, but it let us get information about each country- how it was changing over time - in a way much more complicated than our earlier group by and summarize allowed.

7. Let's practice!