Get Started

Visualizing classification model performance

In this exercise, you will be solving a classification problem where the "popularity" column in the music_df dataset has been converted to binary values, with 1 representing popularity more than or equal to the median for the "popularity" column, and 0 indicating popularity below the median.

Your task is to build and visualize the results of three different models to classify whether a song is popular or not.

The data has been split, scaled, and preloaded for you as X_train_scaled, X_test_scaled, y_train, and y_test. Additionally, KNeighborsClassifier, DecisionTreeClassifier, and LogisticRegression have been imported.

This is a part of the course

“Supervised Learning with scikit-learn”

View Course

Exercise instructions

  • Create a dictionary of "Logistic Regression", "KNN", and "Decision Tree Classifier", setting the dictionary's values to a call of each model.
  • Loop through the values in models.
  • Instantiate a KFold object to perform 6 splits, setting shuffle to True and random_state to 12.
  • Perform cross-validation using the model, the scaled training features, the target training set, and setting cv equal to kf.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create models dictionary
models = {"____": ____(), "____": ____(), "____": ____()}
results = []

# Loop through the models' values
for model in ____.____():
  
  # Instantiate a KFold object
  kf = ____(n_splits=____, random_state=____, shuffle=____)
  
  # Perform cross-validation
  cv_results = ____(____, ____, ____, cv=____)
  results.append(cv_results)
plt.boxplot(results, labels=models.keys())
plt.show()

This exercise is part of the course

Supervised Learning with scikit-learn

IntermediateSkill Level
4.4+
143 reviews

Grow your machine learning skills with scikit-learn in Python. Use real-world datasets in this interactive course and learn how to make powerful predictions!

Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!

Exercise 1: Preprocessing dataExercise 2: Creating dummy variablesExercise 3: Regression with categorical featuresExercise 4: Handling missing dataExercise 5: Dropping missing dataExercise 6: Pipeline for song genre prediction: IExercise 7: Pipeline for song genre prediction: IIExercise 8: Centering and scalingExercise 9: Centering and scaling for regressionExercise 10: Centering and scaling for classificationExercise 11: Evaluating multiple modelsExercise 12: Visualizing regression model performanceExercise 13: Predicting on the test setExercise 14: Visualizing classification model performance
Exercise 15: Pipeline for predicting song popularityExercise 16: Congratulations

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free