Get startedGet started for free

Linear model and a binary response variable

In the video, you saw an example of fitting a linear model to a binary response variable and how things can go wrong quickly. You learned that, given the linear line fit, you can obtain fitted values \(\hat{y}\), which are not in line with the logic of the problem since the response variable takes on values 0 and 1.

Using the preloaded crab dataset, you will study this effect by modeling y as a function of x using the GLM framework.

Recall that the GLM model formulation is:

glm(formula = 'y ~ X', data = my_data, family = sm.families.____).fit()

where you specify formula, data, and family.

Also, recall that a GLM with:

  • the Gaussian family is a linear model (a special case of GLMs)
  • the Binomial family is a logistic regression model.

This exercise is part of the course

Generalized Linear Models in Python

View Course

Exercise instructions

  • Using the crab dataset, define the model formula so that y is predicted by width.
  • To fit a linear model using GLM formula, use Gaussian() for the family argument which assumes y is continuous and approximately normally distributed.
  • To fit a logistic model using GLM formula, use Binomial() for the family argument.
  • Fit a model using glm() with appropriate arguments and use print() and summary() to view summaries of the fitted models.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define model formula
formula = '____ ~ ____'

# Define probability distribution for the response variable for 
# the linear (LM) and logistic (GLM) model
family_LM = sm.families.____
family_GLM = sm.families.____

# Define and fit a linear regression model
model_LM = glm(formula = ____, data = ____, family = ____).fit()
print(____.____)

# Define and fit a logistic regression model
model_GLM = glm(formula = ____, data = ____, family = ____).fit()
print(____.____)
Edit and Run Code