Even more complex problems

1. Even more complex problems

Depending on the lists you are using, you may need to solve more complex problems with purrr. In this lesson, we will cover some of those more complex cases.

2. What if the values you want are buried?

For this example, we'll be using a large list, called gh_repos. gh_repos is an unnamed list with six elements, each element is another list, with 30 elements, each containing data from a GitHub repository. Each repository's data a list of length >60, containing information such as name, owner (a list), how many times the repository has been forked, or copied, and the creation date. Sometimes the elements of a list, are another list, which means that the variable name we are interested in may be buried or nested. To extract data in a list that is inside another list, we will use nested map() functions. We will take one map() function, and put dot x as the first argument, and "forks" as the second argument to get out the number of forks for each repository. Then we will wrap that map() function inside another map() function, putting a tilde just before the nested map(). All of this will be piped onto the end of gh_repos.

3. Summary stats in purrr

Often we need to generate summary statistics about a dataset. Here we are going to practice pulling data from lists into dataframes as we summarize our bird measurements dataset, now with data about four species. If we did this outside of purrr, we would have to create an empty dataframe, extract the names, and then use a for loop and put that output into summary(). This works, but its long and a bit hard to follow.

4. Summary stats

If we use purr, we first use the map_df() function. Inside map_df(), after the tilde, we will use the data_frame() function, and create three columns. The first column is weight, and we will set that equal to dot x double bracket weight, close double brackets. Our second column will be wing_length, which we will set equal to dot x double bracket wing length, close brackets. Our third column will be called taxa and we will set that equal to the word bird. We will then pipe the results of map_df() into a new function select_if(). select_if() will keep only the columns that meet the criteria we put inside the function. Right now our dataframe has three columns, two contain numbers and one contains characters. We can't do summary statistics on the word 'bird' so we are going to put as.numeric() inside select_if(), that way it will only keep the columns that are numeric. Then we will pipe the output into the summary() function, which gives us our minimum, 1st and 3rd quartiles, median, mean, and maximum for weight and wing_length.

5. Let's purrr-actice!

Now let's try some examples.