1. The map family of functions
2. List Column Workflow
In the last video and exercise series, you learned how to use the nest() and unnest() functions for steps one and three of the list column workflow.
3. List Column Workflow
In this lesson, I will introduce you to the map_*() family of functions. These functions fulfill the roles of steps two and three of this workflow.
4. The map Function
The map() function applies a desired function to every element in a vector or a list and always returns a list as its result.
This function requires two parameters, dot x and dot f.
5. The map Function
Dot x is the vector or list that you want to iterate over while dot f is the function.
The function can either be a predefined function or it can be an anonymous function using the formula syntax.
6. The map Function
For example, if you wanted to use the mean() function you can refer to it directly or you can build an anonymous function using the tilda to indicate that you are using a formula and the dot x to indicate the value placeholder.
7. Population Mean by Country
In the previous exercise, you calculated the mean population of the country of Algeria by extracting the first element of the nested data frame then calculating the mean of the population column.
The structure of this is very similar when using map(). You will use map() to calculate the population mean for each country using the corresponding nested data frame of that country.
8. Population Mean by Country
Here the dot x parameter is the data column in the nested data frame. Remember that this column is a list of data frames corresponding to each country.
Since these are data frames you need to use an anonymous function to explicitly calculate the mean for the population column of each data frame. Remember that the dot x here acts as the placeholder for each element of the list.
Since you know that this list contains data frames and you want to calculate the mean of the population column from each data frame, you can refer to this placeholder the same way you would for working with a single element.
The result of this function is a list of population means for the 77 countries.
9. 2: Work with List Columns - map() and mutate()
Remember that tibbles are special data frames that allow us to store arbitrarily complex list columns.
Because of this you can append the resulting list of population means using the mutate() function.
Of course, storing a list of doubles isn't very practical for exploration
10. 3: Simplify List Columns - unnest()
so you need to simplify these columns using unnest().
Let's revisit these steps in the context of the list column workflow.
11. List Column Workflow
First we made a list column of data frames for each country using nest(). Then we worked with the list columns by calculating the population mean of each data frame using map(). Finally, we simplified the resulting nested column with the unnest() function.
In certain situations, you can combine the last two steps using another function from the map_*() family.
12. Work With + Simplify List Columns With map_*()
If you know that the output of the mapped function is a vector of a specific type, you can use a map function corresponding to that type to calculate the result and explicitly return a vector of the expected type.
13. Work With + Simplify List Columns With map_dbl()
For example, the mean() function returns a vector of type double, as such you can use map_double() to return a vector of doubles instead of a list of doubles.
This can be done like so, and as a result, mutate() appends a vector of type double to the data frame instead of a list.
14. Build Models with map()
You can also use map() to build models for each country. Here the lm() function is used to build linear models to predict the population using the fertility feature.
You can define the model using the formula parameter and provide the data for each model using the dot x approach to refer to each country's data frame when mapping.
15. Let's map something!
So let's map some data.