Aan de slagGa gratis aan de slag

Linear model and a binary response variable

In the video, you saw an example of fitting a linear model to a binary response variable and how things can go wrong quickly. You learned that, given the linear line fit, you can obtain fitted values \(\hat{y}\), which are not in line with the logic of the problem since the response variable takes on values 0 and 1.

Using the preloaded crab dataset, you will study this effect by modeling y as a function of x using the GLM framework.

Recall that the GLM model formulation is:

glm(formula = 'y ~ X', data = my_data, family = sm.families.____).fit()

where you specify formula, data, and family.

Also, recall that a GLM with:

  • the Gaussian family is a linear model (a special case of GLMs)
  • the Binomial family is a logistic regression model.

Deze oefening maakt deel uit van de cursus

Generalized Linear Models in Python

Cursus bekijken

Oefeninstructies

  • Using the crab dataset, define the model formula so that y is predicted by width.
  • To fit a linear model using GLM formula, use Gaussian() for the family argument which assumes y is continuous and approximately normally distributed.
  • To fit a logistic model using GLM formula, use Binomial() for the family argument.
  • Fit a model using glm() with appropriate arguments and use print() and summary() to view summaries of the fitted models.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Define model formula
formula = '____ ~ ____'

# Define probability distribution for the response variable for 
# the linear (LM) and logistic (GLM) model
family_LM = sm.families.____
family_GLM = sm.families.____

# Define and fit a linear regression model
model_LM = glm(formula = ____, data = ____, family = ____).fit()
print(____.____)

# Define and fit a logistic regression model
model_GLM = glm(formula = ____, data = ____, family = ____).fit()
print(____.____)
Code bewerken en uitvoeren