Deliver the state of the union

1. Deliver the state of the union

We saw with intersect that it doesn't quite match up with what a join does. It only includes those rows in common to both datasets. One way to think about intersect is similar to the "and" operator. Recall from the Venn diagram that union and union_all are similar to the "or" operator. Let's look further into each with our diagrams.

2. union diagram

The result of a union on left_col and right_col are the rows appearing in either dataset. Notice that the id values of 1 and 4 in right_col are not included again in the union since they were already found in left_col.

3. Prepping for union with Uruguay

To help us see the results of union and union_all that we'll see shortly, let's change our IMF data to only focus on the years 2010 to 2014 and also remove consumer price index. Since union behaves like intersect in that it requires the same column names to work, we'll remove the rural population from the World Bank data too.

4. The new tibbles

Remembering that union will include rows that exist in either dataset, we can start to visualize what the result will be by looking at the two tibbles side-by-side. Let's check to see if that matches up with reality next.

5. union()

Rows for each year from 2010 to 2016 appear here for Uruguay. What happens if we switch the arguments here? We get the same results, albeit with a different ordering of the year column. The union function just stacks the rows on top of each other, laying out all of the rows from the first tibble, then checking if rows in the second tibble were already in the first tibble, and finishing by putting down the remaining rows in the second tibble.

6. union_all diagram

union_all is similar to union, except it does include duplicates. Check this out in the diagram to see the duplicate values of 1 and 4 repeated. Thus, union_all is the same as combining union with intersect. The output was sorted here to make the duplicates even easier to see.

7. union_all()

Let's repeat the code from before the diagram, but with union_all this time instead. Recall that uruguay_imf_filtered had the years 2010 to 2014 and uruguay_wb_filtered had the years 2013 to 2016. We can see that this simply stacks the rows on top of each other, starting with the first tibble and then followed by the second tibble. If we switch the order of our arguments to union_all, the stacking is then switched. The duplicates are not removed with union_all.

8. Let's practice!

Try out some union and union_all exercises.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.