Vocabulary score vary between vs. (self identified) social class

1. Vocabulary score vs. self identified social class

So far in this course, we discussed inference on a single mean as well as inference for comparing two means. Next we move on to comparing many means simultaneously.

2. Vocabulary score and self identified social class

Our motivating data comes from the General Social Survey. The two variables of interest are vocabulary score and self-identified social class. Vocabulary score is calculated based on a ten question vocabulary test, where a higher score means better vocabulary, and self-identified social class has 4 levels: lower, working, middle, and upper class.

3. Vocabulary score: `wordsum`

The vocabulary test works as follows: respondents are given the following list of words, and are asked to choose a word from the list that comes closest to the meaning of the first word provided in the capital letters. For example, is CLOISTERED closest in meaning to miniature, bunched, arched, malady, secluded, or if you were the respondent on this survey would you mark don't know? If you're curious about the vocabulary test feel free to pause the video and work through the rest, but for the purpose of this example we're not going to be focusing on what these words mean, but instead we'll take a look at how people who took the survey did on the vocabulary test and whether their score is associated with their social class or not.

4. Distribution of vocabulary score

The distribution of vocabulary scores is shown in this histogram. The scores range between 0 and 10. The distribution is centered at 5, and looks roughly symmetric.

5. Self identified social class: `class`

And the distribution of social class is shown in this bar plot. These visualizations tell us about the variables individually, but don't tell us much about their relationship.

6. Let's practice!

Time to put this into practice.