Linear model and a binary response variable
In the video, you saw an example of fitting a linear model to a binary response variable and how things can go wrong quickly. You learned that, given the linear line fit, you can obtain fitted values \(\hat{y}\), which are not in line with the logic of the problem since the response variable takes on values 0 and 1.
Using the preloaded crab dataset, you will study this effect by modeling y as a function of x using the GLM framework.
Recall that the GLM model formulation is:
glm(formula = 'y ~ X', data = my_data, family = sm.families.____).fit()
where you specify formula, data, and family.
Also, recall that a GLM with:
- the Gaussian family is a linear model (a special case of GLMs)
- the Binomial family is a logistic regression model.
Diese Übung ist Teil des Kurses
Generalized Linear Models in Python
Anleitung zur Übung
- Using the
crabdataset, define the model formula so thatyis predicted bywidth. - To fit a linear model using GLM formula, use
Gaussian()for the family argument which assumes y is continuous and approximately normally distributed. - To fit a logistic model using GLM formula, use
Binomial()for the family argument. - Fit a model using
glm()with appropriate arguments and useprint()andsummary()to view summaries of the fitted models.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Define model formula
formula = '____ ~ ____'
# Define probability distribution for the response variable for
# the linear (LM) and logistic (GLM) model
family_LM = sm.families.____
family_GLM = sm.families.____
# Define and fit a linear regression model
model_LM = glm(formula = ____, data = ____, family = ____).fit()
print(____.____)
# Define and fit a logistic regression model
model_GLM = glm(formula = ____, data = ____, family = ____).fit()
print(____.____)