Get startedGet started for free

Window functions

1. Window functions

Throughout this chapter, we've looked at trends of baby names over time in the United States.

2. babynames graph

We've made visualizations, and discovered a few names that have gone through major changes over time. But what if you want to look at the biggest changes within each name? To do this, we'd have to find differences between each pair of consecutive years.

3. Window function

To do this, we'll introduce the last new concept of this course: the window function. A window function takes a vector, and returns another vector of the same length. You'll be learning to use the lag() function. For example, suppose we had a vector of 1, 3, 6, and 14. We can lag this vector, which means moving each item to the right by one. Now the first item is NA, meaning it's missing, but it's followed by 1, 3, and 6: the item just prior to it in the original vector.

4. Compare consecutive steps

Now, why is this useful? Because by lining up each item in the vector with the item directly before it, we can compare consecutive steps and calculate the changes. With v minus lag(v), we're asking "What is each value once we've subtracted the previous one?"

5. Changes in popularity of a name

Now that we know how to calculate the difference between consecutive values in a vector, we can use that in a grouped mutate to find the changes in the popularity of one name in consecutive years. Consider the babynames dataset. We'll use the same code as we did in the last lesson to create a table with the babynames fraction.

6. Matthew

Let's start by picking a name, like Matthew, and filtering for it. Then we arrange in ascending order of year. We can see the fraction of babies born each year that are named Matthew. We can also compare between years. Notice that the fraction of babies named Matthew started around point-0005, then went down to point-0004.

7. Matthew over time

To quantify that, we could use a mutate with the lag window function. We want to take each fraction, and subtract the "lagged" fraction, with fraction minus lag(fraction). Notice that the first observation is missing a difference, because there is no previous year. After that, we can see whether Matthew went up or down each year.

8. Biggest jump in popularity

What if we wanted to know the biggest jump that the name Matthew took in popularity? We could sort in descending order of the difference column, and see that the biggest jumps, when Matthew got much more popular, were in 1975 and 1970.

9. Changes within every name

Now what if instead of looking at one name, we wanted to look at the changes within every name? We'll need to do this as a grouped mutate, as you learned in the last lesson, first grouping by name before calculating the difference between each year. This ensures we won't include differences between two different names.

10. Let's practice!

Let's practice!