Get startedGet started for free

Contingency tables

1. Contingency tables

In the last chapter you opened up methods that allow you to learn about the relationship between two two-level categorical variables. In this chapter we expand on this to include more complex variables.

2. Politics and military spending

Let's investigate the relationship between the variables party and natarms. The first contains the political party affiliation of the respondent: Republican, Democrat, or independent. The second contains opinions on whether the government is spending too much money, too little money or about the right amount of money on national defense. A natural way to visualize the relationship between these two variables is with a stacked bar plot.

3. Politics and military spending

We can construct that by putting party on the x-axis and fill the bars using natarms. To make it easy to compare proportions, we can add the position equals "fill" option. If we look at the two major parties, we learn that a much larger proportion of Republicans than Democrats think we spend too little on the military. This also jumps out. It appears that all people who listed "O" or "Other" think that spending is just about right. Can this be correct? Just how many people are in this other group?

4. Politics and military spending

One way to find out is to remove this position equals "fill" argument so that the height of the bars is just the count of people. When we make this change we see that that group is very tiny. To figure out just how tiny this group is, we can represent this data as a contingency table. This process of moving between a data frame involves working with untidy data, so let's load the broom package to help keep things clean.

5. Tables and tidy data

To create a contingency table you select the columns of interest then send them to the table function. A contingency table puts one of the categorical variables along the rows and the other along the columns then counts up each of the combinations. Here we see that there was only one person that listed other and that person thought that funding was just about right. In fact, the counts inside this table are precisely what this bar plot of counts is representing visually. It's very common to run across data presented in a table like this. It's a fine format for displaying data but it's awkward for analyzing data because the rows don't represent observations of data, they represent levels of a variable.

6. Tables and tidy data

To transform a contingency table back into a data frame, send it to the tidy function which reorganizes it so the variables are all across the columns. Note though, that the rows are still the aggregates counts for each group. To extend the set to the original dataset where each row is an individual person, we need to uncount this data frame. OK, now we're back to our original data frame, so we've gone from tidy to table and back again.

7. Tables and tidy data

8. Let's practice!

Now it's your turn to practice.