Get startedGet started for free

Advanced operations with furrr

1. Advanced operations with furrr

The furrr package has a lot more to offer. Let's have a look at some advanced operations with furrr.

2. A bootstrap example

Imagine we have a function, boot-underscore-dist, to bootstrap a distribution of averages. This function takes an argument "B" for the number of random samples. We create a variable n_samples to specify it's value at ten thousand. We also have a list called age_list. Each element contains mother's ages from one US state. We'd like to apply boot_dist() to every element of age_list.

3. The global problem

We plan a multisession with four workers. With future_map() we apply boot_dist() to every element of age_list. We get an error. We did not provide a value for B. Let's recall that multisession, like PSOCK clusters, spawns multiple R sessions that do not share the workspace. The value of B is not available to the sessions.

4. furrr_options

furrr_options() allows us to generate a configuration for the behavior of furrr functions. In the furrr_options() call we can specify the global variables. Here we use the globals argument to export "n_samples". We then plan our multisession as usual. In the future_map() call, we provide the input, age_list, and the function to apply, boot_dist. We set B equals n_samples. Lastly, we supply the configuration we created, config, to the dot-options argument of future_map(). We get the bootstrapped distributions without any errors.

5. Filtering the births dataset

Suppose we want to filter our data now for further analysis. We have a list of data frames, one data frame for each state. We'd like to filter for a minimum value of 20 for mother_age.

6. future_map with packages

So we write this function which takes a data frame and a minimum value for mother's age. It uses dplyr to return a filtered data frame. We store the minimum value in the cutoff variable. We supply this variable to the globals argument of furrr_options(). To apply filter_df() in parallel, we'll need to load dplyr in all our worker processes. We supply the package we need to the packages argument. We could also supply multiple packages as a vector of package names, but they must all be installed. And then we proceed as before.

7. future_map with packages

Here we see our filtered datasets showing births where the mother's age is a minimum of 20.

8. Multiple arguments to loop over

Suppose we now have two inputs to loop over. ls_weights, contains the weight gained for each birth event in a given state; ls_plur contains the corresponding plurality or number of babies born. We'd like to calculate the weight gained per babies born.

9. Multiple arguments to loop over

First, we write a function to do one iteration. This function takes two arguments to loop over. For such a situation furrr offers future_pmap(). This function takes multiple inputs combined into one as the first argument. We combine ls_weights and ls_plur into a single list by wrapping them in a list() call. The second argument is the function to apply. Notice that the order of the input lists is the same as the argument order of the calculate() function. This is to avoid passing an input to the wrong argument of calculate().

10. Multiple arguments to loop over

When we run this we get weight gained per babies born for each state's data frame. So for the 14th birth entry from Alabama, the mother gave birth to twins, gaining 25 pounds per baby.

11. A note about argument order

Unlike the parallel package, furrr functions maintain the order of arguments no matter how many inputs we are looping over. The first is always the input or inputs. The second is function to be applied, followed by other arguments.

12. Let's practice!

Let's go to the exercises and practice these functions.