Using different sets of variables
Adding more variables and therefore more complexity to your logistic regression model does not automatically result in more accurate models. In this exercise you can verify whether adding 3 variables to a model leads to a more accurate model.
variables_1
and variables_2
are available in your environment: you can print them to the console to explore what they look like.
This exercise is part of the course
Introduction to Predictive Analytics in Python
Exercise instructions
- Fit the
logreg
model usingvariables_2
which contains 3 additional variables compared tovariables_1
. - Make predictions for this model.
- Calculate the AUC of this model.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create appropriate DataFrames
X_1 = basetable[variables_1]
X_2 = basetable[variables_2]
y = basetable[["target"]]
# Create the logistic regression model
logreg = linear_model.LogisticRegression()
# Make predictions using the first set of variables and assign the AUC to auc_1
logreg.fit(X_1, y)
predictions_1 = logreg.predict_proba(X_1)[:,1]
auc_1 = roc_auc_score(y, predictions_1)
# Make predictions using the second set of variables and assign the AUC to auc_2
logreg.____(____, ____)
predictions_2 = ____.____(____)[____,____]
auc_2 = ____(____, ____)
# Print auc_1 and auc_2
print(round(auc_1,2))
print(round(auc_2,2))