Animal crossing: new rowwise's

1. Animal crossing: new rowwise's

Next, we'll explore how to perform calculations going across a row of data. We'll see that the rowwise function behaves similar to group_by, in that it treats each row as its own group.

2. Building up to rowwise()

A common problem in data is missing values. Having incomplete data can lead to lots of issues in analyses, and it's often hard to know just how many missing values a particular row of data has. Before heading back to world_bank_data, let's refresh our knowledge of the is-dot-na and sum functions. Let's programmatically count how many NAs are in this simple vector. First, is-dot-na returns whether each element is missing or not. Then, summing over that result counts the number of missing values.

3. Glimpse at world_bank_data

Remember the order of the columns here in world_bank_data. Next, let's count how many missing values are in the columns from infant_mortality_rate to the last column, perc_rural_pop.

4. rowwise() with c_across()

The rowwise function is helpful in doing different aggregations across columns for each row. After calling rowwise, let's create a new column called num_missing that will store the number of missing entries in each row. Next, we'll specify which columns we want to count the missing values in. The c_across function is a special helper for rowwise. It works similar to the c function for concatenation, except it expects unquoted column names as its argument. For this example, we count how many missing values are in the columns from infant_mortality_rate across to the last column. Next, we will choose a few columns and sort them to improve readability. In general, we use c_across with rowwise. The across function does the kinds of transformations across multiple column calculations we saw in the previous lesson. Thus, across and c_across should not be used interchangeably regardless of whether we are using rowwise or not.

5. How many are missing?

We can see here in the output that there are some issues with the 2016 data in terms of completeness, with two columns having missing values for multiple countries.

6. Tracking down the missingness

Let's focus just on Australia and 2016. Taking a glimpse, we can see that both infant_mortality_rate and fertility_rate are unknown in this data for 2016.

7. if_any() with filter()

In addition to tracking down missing values, we can use helper functions to identify and return rows meeting certain criteria. For example, if we were interested in identifying if any of the percentage-based columns were below a threshold, we can use the if_any function inside of filter to do just that. Let's pass the columns starting with "perc" to the dot-cols argument of if_any. We use a threshold of 5%, and specify it with the tilde and the dot-x followed by less than 5. Lastly, we select a few columns to investigate the output. We see that Qatar in 2004, Qatar in 2007, and Singapore in 2006 fell below the 5 percent cutoff of perc_rural_pop. Pakistan in 2005 and Honduras in 2006 fell below 5% for perc_college_complete.

8. if_all()

If we'd like to filter rows where all of the specified column values met the criteria, use the if_all function. if_all works like an and, whereas if_any works like an or. Let's investigate which countries for a particular year had all percentage columns with values at or above 25 percent. The syntax here is the same as for if_any with dot-cols and dot-fns. There are only four rows that have all percentage columns matching the criteria of values greater than or equal to 25.

9. Let's practice!

Test out rowwise on some exercises!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.