Comparing predicted values
In the previous exercise, you have fitted both a linear and a GLM (logistic) regression model using crab
data, predicting y
with width
. In other words, you wanted to predict the probability that the female has a satellite crab nearby given her width.
In this exercise, you will further examine the estimated probabilities (the output) from the two models and try to deduce if the linear fit would be suitable for the problem at hand.
The usual practice is to test the model on new, unseen, data. Such dataset is called test sample.
The test
sample has been created for you and loaded in the workspace. Note that you need test values for all variables present in the model, which in this example is width
.
The crab
dataset has been preloaded in the workspace.
This exercise is part of the course
Generalized Linear Models in Python
Exercise instructions
- Using
print()
view thetest
set. - Using the
test
sample, compute estimated probabilities using.predict()
on the fitted linear modelmodel_LM
and save aspred_lm
. Also, compute estimated probabilities using.predict()
on the fitted GLM (logistic) model saved asmodel_GLM
and save aspred_glm
. - Using
pandas
DataFrame()
combine predictions from both models and save aspredictions
. - Concatenate the
test
andpredictions
and save asall_data
. Viewall_data
usingprint()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# View test set
print(____)
# Compute estimated probabilities for linear model: pred_lm
____ = model_LM.____(____)
# Compute estimated probabilities for GLM model: pred_glm
____ = model_GLM.____(____)
# Create dataframe of predictions for linear and GLM model: predictions
____ = pd.DataFrame({'Pred_LM': ____, 'Pred_GLM': ____})
# Concatenate test sample and predictions and view the results
all_data = pd.concat([____, ____], axis = 1)
print(____)