Linear model and a binary response variable

In the video, you saw an example of fitting a linear model to a binary response variable and how things can go wrong quickly. You learned that, given the linear line fit, you can obtain fitted values \(\hat{y}\), which are not in line with the logic of the problem since the response variable takes on values 0 and 1.

Using the preloaded crab dataset, you will study this effect by modeling y as a function of x using the GLM framework.

Recall that the GLM model formulation is:

glm(formula = 'y ~ X', data = my_data, family = sm.families.____).fit()

where you specify formula, data, and family.

Also, recall that a GLM with:

the Gaussian family is a linear model (a special case of GLMs)
the Binomial family is a logistic regression model.

Diese Übung ist Teil des Kurses

Generalized Linear Models in Python

Kurs anzeigen

Anleitung zur Übung

Using the crab dataset, define the model formula so that y is predicted by width.
To fit a linear model using GLM formula, use Gaussian() for the family argument which assumes y is continuous and approximately normally distributed.
To fit a logistic model using GLM formula, use Binomial() for the family argument.
Fit a model using glm() with appropriate arguments and use print() and summary() to view summaries of the fitted models.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Define model formula
formula = '____ ~ ____'

# Define probability distribution for the response variable for 
# the linear (LM) and logistic (GLM) model
family_LM = sm.families.____
family_GLM = sm.families.____

# Define and fit a linear regression model
model_LM = glm(formula = ____, data = ____, family = ____).fit()
print(____.____)

# Define and fit a logistic regression model
model_GLM = glm(formula = ____, data = ____, family = ____).fit()
print(____.____)

Code bearbeiten und ausführen