Get startedGet started for free

Advanced completions

1. Advanced completions

In this lesson, we'll further explore the complete() function and see how we can combine it with other tidyr and dplyr functions.

2. Nesting connected variables

Let's start with this sample of the nuclear explosions database. We have the total number of bombs detonated by three countries over two decades. Since the UK did not detonate any nuclear bombs in the forties, we want to complete this dataset so that an observation with a zero value for n_bombs is added.

3. Nesting connected variables

However, when we pass all variables to the complete function, we get this result. There are now observations for the USA on the European continent and the UK in North America. These nonsensical observations should not have been added since a country is always connected to the same continent.

4. The nesting() function

We can establish such a connection with the nesting() function. By nesting variables, we tell the complete() function to treat them as a single variable. It will then no longer add observations for new value combinations of these variables. The result is better now, with a single observation added for the UK in the forties.

5. Counting tropical storms

Let's move on to a second example. You're looking at data collected by the US National Hurricane Center. It details the start and end dates of tropical storms in the Atlantic. What if we would want to use this data to find out how many storms were active simultaneously over time?

6. Counting tropical storms: pivot to long format

To answer this question, we need to take multiple steps. First, we reshape the date with the pivot_longer() function to a format with one observation for the start date, and one observation for the end date. What we want to do next, is use the complete() function to generate all dates between the start and end dates. Once this is done, we can simply count the dates and we'll have our answer.

7. Counting tropical storms: grouped completion

However, to achieve this result we first need to group the data by the name of each storm. Only then will the full_seq() function use the start and end dates of each storm individually instead of the first and last date in the full dataset. Note that after completing, we ungroup the data to not affect future operations. Now that we have all the dates on which each storm was raging,

8. Counting tropical storms: the actual count

we can count the occurrences of dates to see how many storms were active simultaneously. We do so with dplyr's count() function.

9. Counting tropical storms: adding zero counts

Finally, we'll use the complete() function once more to add zero values for dates on which no storms were active.

10. Counting tropical storms: visualizing the result

When we visualize the result, we can clearly see the Atlantic hurricane season starting around July and ending in October each year.

11. Timestamp completions

The final advanced completion of this lesson is on time series data. So far, we've used the full_seq() function to create a sequence of integers or dates but we have not yet worked with actual timestamps. The data you're looking at was produced by a temperature sensor that sends updates whenever the temperature changes by at least one degree Celsius. But what if you want to have the most recent reading for every 10 or 20 minutes?

12. Timestamp completions

You can achieve this with the seq() function, we can pass it the first and last timestamp in the data using the min() and max() functions and then specify that we want a value every 20 minutes by passing the string "20 min" to the by argument.

13. Timestamp completions

To overwrite NA values with the last sensor reading we can use the fill() function on the temperature variable.

14. Let's practice!

Now it's your turn, let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.