Get startedGet started for free

Discovering the dataset

1. Discovering the dataset

Welcome to our last chapter! Congratulations on making it so far. We'll now put into practice everything we've seen in this course.

2. The dataset

For this case study, we'll be using a dataset of tweets, collected during the RStudio conference in January 2018. This dataset is a rather long list, with 5055 elements. Each element of this list is another list of 31 elements. Inside these 31 elements, you can find scalar elements and other sublists. It might seem a little bit complicated, but as you can see the maximum depth of this object is 4 — and I can ensure you it's a lightly nested list if there are only four levels of depth. We'll use this dataset to review what we have seen in this course, and we'll learn some new purrr functions along the way.

3. JSON - A typical API output

Nested lists might seem strange to you if you have always worked with dataframes. But actually, it's a pretty standard data format when you are retrieving data from the web: most APIs return JSON (short for JavaScript Object Notation), which is read as a nested list by R. Why this format? Because not everything can be put into rows, columns, and cells. JSON is a format that allows having a variety of elements at several levels of depth. It's also lighter, and quicker to run on the web.

4. Predicate refresher

In this course, we have reviewed the basic iteration functions that start with "map." We've seen what we have called "predicates," which are functions that take an input, and return TRUE or FALSE. For example, is.numeric() is a predicate, as it returns TRUE or FALSE, depending on the input you're providing. We have also seen several predicate functionals, which are functions that take an element and a predicate, and use this predicate on the element. For example, you can use keep() and discard(), which take a list and a predicate and apply the predicate on the list.

5. keep() & discard()

In the first chapter of this course, we saw keep() and discard(), which are two predicate functionals you can use to clean a list. As their names suggest, keep() keeps the elements that meet the condition defined by the predicate, and discard() does the opposite, as it discards the elements that meet the condition.

6. Let's practice!

Let's try these functions on our RStudio Conference dataset.