Get startedGet started for free

Counts vs. proportions

1. Counts vs. proportions

You may have noticed in the last exercises that sometimes raw counts of cases can be useful, but often it's the proportions that are more interesting. We can do our best to compute these proportions in our head or we could do it explicitly.

2. From counts to proportions

Let's return to our table of counts of cases by identity and alignment. If we wanted to instead get a sense of the proportion of all cases that fell into each category, we can take the original table of counts, saved as tab underscore cnt, and provide it as input to the prop dot table function. We see here that the single largest category are characters that are bad and secret at about 29% of characters. Also note that because these are all proportions out of the whole dataset, the sum of all of these proportions is 1.

3. Conditional proportions

If we're curious about systematic associations between variables, we should look to conditional proportions. An example of a conditional proportion is the proportion of public identity characters that are good. To build a table of these conditional proportions, add a 1 as the second argument, specifying that you'd like to condition on the rows. We see here that around 57% of all secret characters are bad. Because we're conditioning on identity, it's every row that now sums to one. To condition on the columns instead, change that argument to 2. Now it's the columns that sum to one and we learn, for example, that the proportion of bad characters that are secret is around 63%. As the number of cells in these tables gets large, it becomes much easier to make sense of your data using graphics. The bar chart is still a good choice, but we're going to need to add some options.

4. Insert title here...

Here is the code for the bar chart based on counts. We want to condition on whatever is on the x axis and stretch those bars to each add up to a total proportion of 1,

5. Insert title here...

so we add the position equals fill option to the geom bar function. Let's add one additional layer:

6. Insert title here...

a change to our y axis to indicate we're looking at proportions.

7. Conditional bar chart

When we run this code at the console, we get a plot that reflects our table of proportions after we had conditioned on id.

8. Conditional bar chart

While the proportion of secret characters that are bad is still large, it's actually less than

9. Conditional bar chart

the proportion of bad characters in those that are listed as unknown. We get a very different picture if we condition instead on alignment.

10. Conditional bar chart

The only change needed in the code is to swap the positions of the names of the variables. This results in a plot where we've conditioned on alignment and we learn that within characters that are bad,

11. Conditional bar chart

the greatest proportion of those are indeed secret. This might seem paradoxical, but it's just a result of having different numbers of cases in each single level.

12. Let's practice!

Ok, now you try experimenting with conditional proportions.