Get startedGet started for free

Comparing predicted values

In the previous exercise, you have fitted both a linear and a GLM (logistic) regression model using crab data, predicting ywith width. In other words, you wanted to predict the probability that the female has a satellite crab nearby given her width.

In this exercise, you will further examine the estimated probabilities (the output) from the two models and try to deduce if the linear fit would be suitable for the problem at hand.

The usual practice is to test the model on new, unseen, data. Such dataset is called test sample.
The test sample has been created for you and loaded in the workspace. Note that you need test values for all variables present in the model, which in this example is width.

The crab dataset has been preloaded in the workspace.

This exercise is part of the course

Generalized Linear Models in Python

View Course

Exercise instructions

  • Using print() view the test set.
  • Using the test sample, compute estimated probabilities using .predict() on the fitted linear model model_LM and save as pred_lm. Also, compute estimated probabilities using .predict() on the fitted GLM (logistic) model saved as model_GLM and save as pred_glm.
  • Using pandas DataFrame() combine predictions from both models and save as predictions.
  • Concatenate the test and predictions and save as all_data. View all_data using print().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# View test set
print(____)

# Compute estimated probabilities for linear model: pred_lm
____ = model_LM.____(____)

# Compute estimated probabilities for GLM model: pred_glm
____ = model_GLM.____(____)

# Create dataframe of predictions for linear and GLM model: predictions
____ = pd.DataFrame({'Pred_LM': ____, 'Pred_GLM': ____})

# Concatenate test sample and predictions and view the results
all_data = pd.concat([____, ____], axis = 1)
print(____)
Edit and Run Code