Joining datasets

1. Joining datasets

So far in our course on United Nations exploratory data analysis,

2. Processed votes

you've been working with this votes_processed dataset, where each row, or observation, represents a pairing of a roll call vote and country. You've been treating these roll call votes as interchangeable, paying attention to only the year, country and vote, and summarizing them to draw conclusions. But these resolutions cover a vast range of political and historical issues. In this chapter, you're going to bring in some context about each resolution, specifically topic information. You'll do this with the descriptions dataset

3. Descriptions dataset

- a second, separate data frame with new information about each roll call vote. Let's look at the variables in this table. You see you have the rcid - or roll call ID- and session variables, which are the same columns used to describe each roll call in the votes_processed dataset. The difference is that here, instead of each observation being a country-roll call pair, here there's just one observation for each roll call- the first observation is the vote on September 4th, as you can see in the date variable, the second is a vote on October 5th, and so on. The descriptions dataset also contains the United Nations resolution it was related to, in unres, and most importantly topic information, about whether each vote related to one of six topics. For example, the second roll call vote has a 1 in the hr column, which means it relates to human rights. This dataset doesn't tell us anything about countries or their votes, so you want to combine it with the votes_processed dataset

4. inner_join()

to examine how different countries voted on different topics. This is done with dplyr's inner_join function. You use the "by" argument to note the two columns they have in common: rcid and session- which are used to match rows together between the tables. You then have all the variables from the original votes_processed dataset included in the new table, including vote, year, and country. You also have all the variables from the descriptions dataset - date, unres, and the topic columns. inner_join combined the information in these two tables so we can examine them together.

5. Let's practice!

In your exercises, you'll manipulate this combined dataset using other dplyr operations, such as filtering for all votes related to human rights issues.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.