Get startedGet started for free

Assessing a diabetes prediction classifier

In this chapter you'll work with the diabetes_df dataset introduced previously.

The goal is to predict whether or not each individual is likely to have diabetes based on the features body mass index (BMI) and age (in years). Therefore, it is a binary classification problem. A target value of 0 indicates that the individual does not have diabetes, while a value of 1 indicates that the individual does have diabetes.

diabetes_df has been preloaded for you as a pandas DataFrame and split into X_train, X_test, y_train, and y_test. In addition, a KNeighborsClassifier() has been instantiated and assigned to knn.

You will fit the model, make predictions on the test set, then produce a confusion matrix and classification report.

This exercise is part of the course

Supervised Learning with scikit-learn

View Course

Exercise instructions

  • Import confusion_matrix and classification_report.
  • Fit the model to the training data.
  • Predict the labels of the test set, storing the results as y_pred.
  • Compute and print the confusion matrix and classification report for the test labels versus the predicted labels.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import confusion matrix
____

knn = KNeighborsClassifier(n_neighbors=6)

# Fit the model to the training data
____

# Predict the labels of the test data: y_pred
y_pred = ____

# Generate the confusion matrix and classification report
print(____(____, ____))
print(____(____, ____))
Edit and Run Code