Exercise

Linear model and a binary response variable

In the video, you saw an example of fitting a linear model to a binary response variable and how things can go wrong quickly. You learned that, given the linear line fit, you can obtain fitted values \(\hat{y}\), which are not in line with the logic of the problem since the response variable takes on values 0 and 1.

Using the preloaded crab dataset, you will study this effect by modeling y as a function of x using the GLM framework.

Recall that the GLM model formulation is:

glm(formula = 'y ~ X', data = my_data, family = sm.families.____).fit()

where you specify formula, data, and family.

Also, recall that a GLM with:

  • the Gaussian family is a linear model (a special case of GLMs)
  • the Binomial family is a logistic regression model.

Instructions

100 XP
  • Using the crab dataset, define the model formula so that y is predicted by width.
  • To fit a linear model using GLM formula, use Gaussian() for the family argument which assumes y is continuous and approximately normally distributed.
  • To fit a logistic model using GLM formula, use Binomial() for the family argument.
  • Fit a model using glm() with appropriate arguments and use print() and summary() to view summaries of the fitted models.