Get startedGet started for free

Do more with future_map()

1. Do more with future_map()

future_map() has variants that allow us to apply functions by groups and modify result collection. Let's see this in practice.

2. The data and the story

We will use the US births dataset. The last column, plurality, is the count of babies born for a given birth event (single, twins, etc). We would like to create a character column that labels each birth, "Twins" for plurality equals two and "Not twins" otherwise. Setting aside vectorization and boolean variables, we have this function that returns the label for one value of plurality. The task takes too long, so we'd like to apply it in parallel.

3. Labeling the twins

To start, we plan a multisession. We use the mutate() function from dplyr to create the column. We are specifying the type of the expected output (character) by using the _chr suffix for future_map. We supply the name of the input column, plurality, and the function, twins(), to future_map_chr() as arguments. We revert to a sequential plan after execution. And there we have our column.

4. Labeling the twins

If we benchmark map_chr() from purrr against the future-enabled future_map_chr(), we can see we have cut the execution time by 50%!

5. Operations by groups

How about operations by groups? We would like to calculate the proportion of twins out of all births for each state. Let's write a function that will do this for the whole data frame. This function sums instances where plurality equals two, and divides it by the number of rows in the data frame. We name the single value as "proportion".

6. Operations by groups

We want to split the data by a group, state in our example, apply a function to each group, and combine the results.

7. Operations by groups

With our function ready, we plan the multisession. Now all we need is to apply this function to the data split by our grouping, the state. Here we will introduce two important functions: The split() function splits the data frame into a list of data frames for each unique value in a vector. This list is piped to future_map_dfr(). future_map_dfr() returns a data frame by row-binding the results. Note that the input list is already being piped to future_map_dfr(), so we only supply the function birth_prop(). We also specify the dot-id argument so that future_map_dfr() returns the results with a column for state. We then revert to a sequential plan.

8. Using global variables

What if we wanted to vary the plurality value? We will modify our birth_prop() function so that it can take an additional argument for the plurality value. Notice that new_plur is a global variable needed by birth_prop(), but is not looped over.

9. Using global variables

Here we will recall the furrr_options() function which enables us to create a configuration for the behavior of the furrr functions. We supply the name of our global variable to the globals argument. We plan a multisession. As before, we split the data by state and pipe it to future_map_dfr(). We specify that the exported global new_plur is to be supplied to the plur_value argument of birth_prop(). To the dot-options argument, we supply the configuration we created above. And when we run this, we get our new proportions!

10. Column-bind to a data frame

future_map_dfr() row-binds the results into a data frame. What if we want to collect results by doing a column-bind? future_map_dfc() is the answer. From our previous example, we only need to change the _dfr suffix to _dfc to achieve this. Also, note that we do not use the dot-id argument, because the results will be combined by column and no ID column is needed. When we run this, we get the result by columns.

11. Let's practice!

Now let's practice these concepts!