Summarizing gender discrimination
As the first step of any analysis, you should look at and summarize the data. Categorical variables are often summarized using proportions, and it is always important to understand the denominator of the proportion.
Do you want the proportion of women who were promoted or the proportion of promoted individuals who were women? Here, you want the first of these, so in your R code it's necessary to group_by()
the sex
variable.
The discrimination study data are available in your workspace as disc
.
This exercise is part of the course
Foundations of Inference in R
Exercise instructions
- Using the
count()
function fromdplyr
, tabulate the variablessex
andpromote
. - Summarize the data by using
group_by()
on thesex
variable. - Find the proportion who were promoted. Call this variable
promoted_prop
. Note that with binary variables, the proportion of either value can be found using themean()
function (e.g.mean(variable == "value")
).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a contingency table summarizing the data
disc %>%
# Count the rows by sex, promote
___
# Find proportion of each sex who were promoted
disc %>%
# Group by sex
___
# Calculate proportion promoted summary stat
___