Coding categorical variables
In previous exercises you practiced creating model matrices for continuous variables and applying variable transformation. During this exercise you will practice the ways of coding a categorical variable.
Categorical data provide a way to analyze and compare relationships given different groups or factors. Hence, choosing a reference group is important and often, depending on the study at hand, you might want to change the reference group, from the default one. One frequently used reason for changing the reference group is that the interpretation of coefficient estimates is more applicable and interesting given the study.
For this exercise you will revisit the crab
dataset where color
and spine
are categorical variables.
The dataset crab
is preloaded in the workspace.
This exercise is part of the course
Generalized Linear Models in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import function dmatrix
from ____ import ____
# Construct and print model matrix for color as categorical variable
print(____('____', data = ____,
return_type = 'dataframe').head())