1. Mapping the future
In this lesson we will discuss more practical uses for futures in R.
2. We know our futures
In the last lesson we learned how to create a future, as shown. We can now use futures to plan and run our computations. But we can go even further by using future-enabled tools.
3. Mapping with purrr
The purrr package provides a robust toolkit for functional programming.
This means that we can apply a function to a series of inputs using purrr. Like apply() in base R, here the key verb is map().
The map() function takes two main arguments, a series of inputs and a function to apply to them. Here we use map to calculate the square-roots of numbers one to a million. We run this to get the results, where one is the square root of one, one-point-414 is the square root of two, and so on. Notice that the output is a list.
4. Incarnations of map
In fact, the map() function has many variants, depending on the type of output we want.
For our square-root example, we could use map_dbl() or map-double. That is, map with the "_dbl" suffix. This is because we expect the output to be decimal numbers, or doubles. Notice that the output is a numeric vector.
5. Incarnations of map
map_chr() or map-character returns strings. This function will convert its output into a character vector.
Here, all the square-root numbers are coerced into a string format.
6. Incarnations of map
No matter what situation we are in, there is probably a map() variant for us. Please remember that the output type describes the output of one iteration. If each iteration outputs a vector, we are better off using the plain map() variant to get a list of these vectors.
7. Type specification
Specifying the type by using the correct map() variant can give us a speed boost.
Here, we benchmark the performance of different map() variants for our million square-root example.
Using map_dbl() reduces execution time by about 20%.
Specifying the wrong type, map_chr() for character, almost doubles the execution time! This is because map() has to convert all these numbers to characters in the background.
8. future + purrr = furrr
Excellent, but why are we discussing purrr functions in the first place?
As it happens, the furrr package implements a future backend for every purrr function.
For example, we calculated the million square-roots with purrr as shown.
To do this with futures, all we have to do is add "future_" as a prefix to the purrr function. We are one step closer to parallelization using futures.
9. furrr in parallel
Let's recall that detectCores() from the parallel package can tell us how many cores are available. We use all but two cores.
We plan a multisession, specifying the number of cores as workers.
We use future_map_dbl() to apply sqrt() to all numbers.
Once done, we revert to a sequential plan.
10. future_map and family
And just like purrr, there are furrr functions that can specify output type. All we need is to add the "future_" prefix to a purrr function to get a future-enabled mapping.
11. The advantage of furrr
The furrr package gives us a great toolkit for the use of futures.
Imagine we have a series of inputs, input_list. The calculate() function maps onto to each of them to return a numeric value. Using only futures, we'd need to create a future for each mapping, and then query its value. This will return a list which we will need to combine into a numeric vector.
Using the future_map_dbl() function from the furrr package, this is all done in one line.
12. Let's practice!
Now let's put these new tools to work!