Get startedGet started for free

What is your major mal-function?

1. What is your major mal-function?

In this last chapter, we'll tie together many of the ideas seen in the first three chapters using functions. We'll see how to use some special tidyverse operators from the rlang package to create functions that work with dplyr verbs and ggplot2.

2. Recall Uruguay's CPI

Previously, we focused on IMF data for Uruguay. We chose just a few columns and then focused on years more recent than 2010 and the consumer price index (or CPI) column. What if we'd like to look at a different country's data quickly?

3. Another country's CPI

We could replace "Uruguay" with another country name and repeat the code again. If we wanted to look at recent consumer price index results for Belize, we replace "Uruguay" with "Belize" in filter to get our results.

4. Creating a function instead

Copying and pasting these lines time and again becomes tedious if we want to do this more than a few times, though. Copying and pasting is also prone to errors. One way to get around this is to create a function that performs this subsetting. We give the function a name; here, we choose cpi_by_country. The country_name argument will be a string. This corresponds to what "Belize" was in the code that was copied and has been updated in the filter function.

5. Use the function

Let's try another country, using the Polynesian island country of Samoa. We have working results here by calling our created function and specifying the country of interest as country_name.

6. Joining IMF and World Bank data

The imf_data we have been working with does not contain the continent or region variables, but world_bank_data does. We'll next explore how to perform some summary calculations after joining these two tibbles together. Remember that the country column doesn't have matching values for all countries in the two datasets. We saw this with Iran, for example. Thus, we link only on iso and year. We move the columns of continent and region to appear after the year column to better visualize the join results.

7. Results of the join

The result is a much smaller dataset since world_bank_data has far fewer entries than imf_data. Let's now summarize some of this data by continent.

8. Mean Government Revenue as GDP % by Continent

Now that we have our joined data, suppose we are interested in analyzing each continent's average governmental revenue as a percentage of GDP over recent years. This can be explored by grouping by continent and then summarizing to calculate this mean. Europe and Oceania have the largest mean percentages, and Africa has the lowest governmental revenue percentage.

9. A function for grouping?

Based on what we saw for our cpi_by_country function, we could try to create a function to group_by a different variable instead of continent. We replace continent with our argument name of group_col. Now we try to call our new function grouping by year instead of continent. Alas, we get an error. group_by expects the columns to be passed in without quotations. This is called non-standard tidy evaluation and is great for the user since we don't have to quote all the variable names when we use functions like group_by, select, or summarize. But this does make it harder to program with dplyr.

10. The curly-curly {{ }} operator

Luckily, the rlang package allows us to work around errors like this to get working code. To do so, we can wrap column arguments that are unquoted in the curly-curly operator from the rlang package. This tells R not to look for group_col specifically in joined, but rather the argument passed into group_col instead. When we call the function to get results grouped by year, our code works.

11. Let's practice!

Try out some exercises on building functions and getting used to this curly-curly operator.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.