Get startedGet started for free

Lending a helper hand

1. Lending a helper hand

We are nicely warmed up with dplyr now! In this lesson and the next, we'll see how helper functions can help with selecting columns in our data.

2. starts_with()

The starts_with function from the tidyselect package searches for columns that start with a particular string. When combined with the select function, starts_with enables us to choose columns with simpler code than entering the name of each column, as we saw in the previous lesson. Note that this function and other tidyselect functions we'll discuss in this course are loaded automatically when dplyr is loaded.

3. starts_with() example

Suppose we are interested in each of the columns in world_bank_data that start with "perc". We include the starts_with function inside of the select function. The argument to starts_with is the string perc. For world_bank_data, this will give us the columns that contain percentages. These four columns (covered from left to right) correspond to percentages of the population that have electricity access, have completed a college degree, with mortality from cardiovascular disease (CVD), cancer, diabetes, or chronic respiratory disease (CRD), and dwell in a rural setting.

4. Where and when are these results from!?

But this previous result is not very satisfying. The country and year corresponding to these different values were lost in the selection. Thus, it is often helpful to include identifying variables in addition to the starts_with call in our select function. Since we can separate which columns our analysis requires with commas, we include country and year here as well to provide clarity to the data. This gives us better information on which country and when these different percentages were recorded.

5. ends_with()

The tidyselect package also has a function named ends_with that searches for columns that end with a particular string. In the world_bank_data, columns corresponding to rates end with "rate". This means we can use the ends_with function to select rate columns, similarly to how we selected percentage columns with starts_with. Thus, to look for all columns that end in "rate", we pass that as a string argument to ends_with. The country and year columns are also included here as before. Since they are specified first in select, they also appear in the result first. These rates refer to proportions measured for each country and year. In this data, infant_mortality_rate is per 1000 live births, fertility_rate is the average number of births a woman has in their lifetime, and unemployment_rate is measured out of the total labor force.

6. Let's practice!

Time to try out some more exercises with the IMF data.