Visualizing classification model performance
In this exercise, you will be solving a classification problem where the "popularity"
column in the music_df
dataset has been converted to binary values, with 1
representing popularity more than or equal to the median for the "popularity"
column, and 0
indicating popularity below the median.
Your task is to build and visualize the results of three different models to classify whether a song is popular or not.
The data has been split, scaled, and preloaded for you as X_train_scaled
, X_test_scaled
, y_train
, and y_test
. Additionally, KNeighborsClassifier
, DecisionTreeClassifier
, and LogisticRegression
have been imported.
This is a part of the course
“Supervised Learning with scikit-learn”
Exercise instructions
- Create a dictionary of
"Logistic Regression"
,"KNN"
, and"Decision Tree Classifier"
, setting the dictionary's values to a call of each model. - Loop through the values in
models
. - Instantiate a
KFold
object to perform 6 splits, settingshuffle
toTrue
andrandom_state
to12
. - Perform cross-validation using the model, the scaled training features, the target training set, and setting
cv
equal tokf
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create models dictionary
models = {"____": ____(), "____": ____(), "____": ____()}
results = []
# Loop through the models' values
for model in ____.____():
# Instantiate a KFold object
kf = ____(n_splits=____, random_state=____, shuffle=____)
# Perform cross-validation
cv_results = ____(____, ____, ____, cv=____)
results.append(cv_results)
plt.boxplot(results, labels=models.keys())
plt.show()