1. Introduction to Programming with purrr
Hello and welcome to this course on programming with purrr.
2. $whoami
I'm Colin Fay; I'm a data scientist and R hacker. As you can only hear me, here is a picture of me drinking coffee.
A big part of my day job is writing R code, and I use purrr to help me optimize my workflow.
Notably, I use purrr for writing functions, which is what you'll do in this course.
3. Discovering purrr
Before starting this course, make sure you've got some basic knowledge of iteration using purrr.
If you're not familiar with it, here are some great resources that can help you learn about iteration with purrr.
If you're already familiar iterating with purrr, you can start this course right now!
4. What will this course cover?
Almost every iteration process has two sides. The first are the elements we iterate over. The second is the function we apply to each element. purrr follows this format.
The first purrr function everyone learns is the map() function. map() has two elements. First, .x, which is the object we are iterating over — this object can either be a vector, a list, or a data frame.
The second part includes .f and the dot dot dot argument, which represents the functional part of the iteration. It's the description of what happens to each element of the object.
In this course, you'll learn how to deal with the second half of the basic purrr skeleton: the .f and dot dot dot arguments.
5. The data
For this course, I have extracted three lists from a dataset taken from the open data portal of the French city of St. Malo.
This dataset gathers a count of the number of visits on the website saint-malo.fr.
Each of these newly created lists is a year: 2014, 2015, and 2016. Each contains 12 sublists corresponding to months. Each month is a vector with the number of website visits per day.
These list objects are an extraction of the full dataset. This list format is one you regularly encounter when querying data on the web, notably when you have to deal with the JSON format, which is parsed as a nested list in R.
6. purrr basics - a refresher (Part 1)
Let's start with a refresher of purrr basic functions using these three lists.
First, let's see the map() function, that runs the .f function on each element of .x, and always returns a list. Here, we are mapping the sum() function on the list visit_2015.
One strength of purrr is that it is type stable, which means that you always know upfront the class of the output. Here, for example, we are using the map_dbl() function, which does the same operation as map(), but the output is different. As you can see, the result is a numeric vector rather than a list.
7. purrr basics - a refresher (Part 2)
Let's now imagine we want to add visits from 2015 and 2016. To do this, I can use the map2() function to get a list, and its counterpart map2_dbl() to get a vector of numeric.
8. purrr basics - a refresher (Part 3)
What if we want to do the same computation as the one previous slide, but with three lists? In purrr, there is no map3(), map4(), et cetera. If you want to map over more than two elements, you'll need to pass a list of elements to the pmap() function.
To iterate over elements in three or more lists, you'll need to put all these lists into another list and pass this "master list" as the first argument of pmap().
9. Let's practice!
Now it's your turn to refresh your purrr memory!