Get startedGet started for free

Categorical data: analyze and visualize

1. Categorical data: analyze and visualize

In this video you will explore relationships between categorical variables. You will first create contingency tables showing frequencies and proportions for the different categories. Then you will perform chi-square tests for differences between the group proportions and visualize them using mosaic plots.

2. Collapse categories

For the BMI categories you created earlier, there are only 3 people in the obese category. Three is a small number of subjects in this category. Also the chi-square test assumes that expected counts not be less than 5 in more than 20 percent of all categories. So, it will be good to merge these three obese subjects with the overweight subjects. The ifelse function can be used inside the dplyr mutate function to collapse the obese and overweight categories together. This code creates a new categorical variable bmigt25 and adds it to the daviskeep dataset. The table function shows that the 3 obese subjects and 35 overweight subjects were successfully merged for 38 overwt/obese subjects.

3. Contingency tables SAS and R

Contingency tables are useful for summarizing the number of subjects in each category for all possible pairwise combinations of categories for the two variables. Similar to the PROC FREQ SAS procedure, R's table function will create a simple contingency table. The CrossTable function from the gmodels package also creates contingency tables with customizable output options.

4. Chi-square tests SAS and R

The chi-square test is often used to test for associations between categorical variables. To perform a chi-square test in R, the chisq.test function is run on the table function output. The chisq = TRUE option is used to run a chi-square test in the gmodels CrossTable function similar to specifying the CHISQ option in the SAS PROC FREQ procedure.

5. Contingency table and chi-square test

To get the contingency table of frequencies for the bmi categories, and by sex, you use the the table function and save the output. To use the table function with the dplyr pipe operator, the with function is needed to make sure that the daviskeep data is attached to the variable referenced in the table function. The chisq.test function run on the table output shows that the p-value is very small indicating that there is a significant association between the bmi and sex categorical variables.

6. Chi-square tests with gmodels package

The CrossTabs function from the gmodels package also produces nice contingency tables and performs a chi-square test. The output can be customized to include row, column or total proportions plus expected counts and chi-square proportions. The chisq = true requests the chi-square test is to be performed. Row, total and chi-square proportions are all set to FALSE, so only the column proportions and expected counts will be displayed. The gmodels CrossTable output is verbose, so it is shown over the next two slides.

7. CrossTable output - part 1

The output shown here is similar to SAS's PROC FREQ output. The frequencies and column proportions show that there are proportionately more males in the overweight/obese category than females. Expected counts are all more than 5.

8. CrossTable output - part 2

The CrossTable output also provides the chi-square test results with and without the Yates' continuity correction.

9. Mosaic plots SAS and R

Similar to the FREQPLOT option in the SAS PROC FREQ procedure, R's mosaicplot function is used to visualize the categorical proportions computed in the contingency table.

10. Mosaicplot of two-way categorical proportions

The formula syntax is used in the mosaicplot function to request the association between bmigt25 and sex. The color option is used to specify colors for the two sex categories and the main option is used to specify the plot title. The resulting plot creates rows and columns with widths proportional to the relative percentages of each category.

11. Let's explore categorical associations for the abalones!

Let's explore categorical associations for the abalones!