Visualizing classification model performance
In this exercise, you will be solving a classification problem where the "popularity"
column in the music_df
dataset has been converted to binary values, with 1
representing popularity more than or equal to the median for the "popularity"
column, and 0
indicating popularity below the median.
Your task is to build and visualize the results of three different models to classify whether a song is popular or not.
The data has been split, scaled, and preloaded for you as X_train_scaled
, X_test_scaled
, y_train
, and y_test
. Additionally, KNeighborsClassifier
, DecisionTreeClassifier
, and LogisticRegression
have been imported.
This is a part of the course
“Supervised Learning with scikit-learn”
Exercise instructions
- Create a dictionary of
"Logistic Regression"
,"KNN"
, and"Decision Tree Classifier"
, setting the dictionary's values to a call of each model. - Loop through the values in
models
. - Instantiate a
KFold
object to perform 6 splits, settingshuffle
toTrue
andrandom_state
to12
. - Perform cross-validation using the model, the scaled training features, the target training set, and setting
cv
equal tokf
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create models dictionary
models = {"____": ____(), "____": ____(), "____": ____()}
results = []
# Loop through the models' values
for model in ____.____():
# Instantiate a KFold object
kf = ____(n_splits=____, random_state=____, shuffle=____)
# Perform cross-validation
cv_results = ____(____, ____, ____, cv=____)
results.append(cv_results)
plt.boxplot(results, labels=models.keys())
plt.show()
This exercise is part of the course
Supervised Learning with scikit-learn
Grow your machine learning skills with scikit-learn in Python. Use real-world datasets in this interactive course and learn how to make powerful predictions!
Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!
Exercise 1: Preprocessing dataExercise 2: Creating dummy variablesExercise 3: Regression with categorical featuresExercise 4: Handling missing dataExercise 5: Dropping missing dataExercise 6: Pipeline for song genre prediction: IExercise 7: Pipeline for song genre prediction: IIExercise 8: Centering and scalingExercise 9: Centering and scaling for regressionExercise 10: Centering and scaling for classificationExercise 11: Evaluating multiple modelsExercise 12: Visualizing regression model performanceExercise 13: Predicting on the test setExercise 14: Visualizing classification model performanceExercise 15: Pipeline for predicting song popularityExercise 16: CongratulationsWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.