1. map2() and pmap()
Lets crank up the complexity!
2. More complex interations
Oftentimes we need to answer questions using multiple datasets. If those two datasets are both stored in lists, we can use the function map2() to pull out information from each list and bring it together.
Here we are using map2() to simulate more data, where we have the mean values stored in one list, and the standard deviations stored in another.
Each list has its own argument. dot x is the first list and dot y is the second list. Here we have two lists, list_of_means is dot x, lists_of_sd is dot y, and data dot frame is the dot f argument. map2() will iterate over both those inputs and create a new dataframe.
Each element of our new list is a new dataframe, filled with newly simulated data. Here we look at the first 6 rows of the first element. It has two columns, a and b, with our newly simulated values.
3. What if we didn't use purrr?
The reason that we are using purrr, even for these more complex tasks, is that getting the same result, without purrr, is much more complicated.
Here, we are trying to simulate some data, where the means, standard deviations, and sample sizes, are stored in three separate lists. To do this without purrr, we need three nested for loops. This is much harder to read, and leaves much more room for misplacing a bracket, or a comma and causing you to have to focus on debugging, rather than getting insights from your data.
In the next few slides, we'll look at how the purrr solution to this problem allows you to focus on the data, instead of the syntax, and make code that is more human readable.
4. pmap() inputs
If you have three or more lists, you can use pmap(). Be careful though as there are some differences here.
The first difference is how we input lists. In map() or map2() we would use the first, or first and second arguments, to input our lists. For pmap, we need to create a list of all the lists we want to use, as our input. Here that list of lists is called input_list.
5. pmap()
To simulate our dataset, using our three datasets stored in lists, we will use our new input list as our first argument in pmap(). The second argument of pmap(), is a custom function.
Inside the custom function, we will use the names of the elements from our inputs_list as the arguments. This allows us to use the name of those lists as our arguments instead of using dot x or dot y. So we can fill in the arguments for rnorm() so that mean equals means, n equals sample size, and sd equals sd.
Our new list is our simulated datasets. Each element of the list is a different dataset, in a dataframe.
Here we are looking at the first 6 rows of the first element.
6. Let's purrr-actice!
Time to put this into practice.