Assessing a diabetes prediction classifier
In this chapter you'll work with the diabetes_df
dataset introduced previously.
The goal is to predict whether or not each individual is likely to have diabetes based on the features body mass index (BMI) and age (in years). Therefore, it is a binary classification problem. A target value of 0
indicates that the individual does not have diabetes, while a value of 1
indicates that the individual does have diabetes.
diabetes_df
has been preloaded for you as a pandas DataFrame and split into X_train
, X_test
, y_train
, and y_test
. In addition, a KNeighborsClassifier()
has been instantiated and assigned to knn
.
You will fit the model, make predictions on the test set, then produce a confusion matrix and classification report.
This exercise is part of the course
Supervised Learning with scikit-learn
Exercise instructions
- Import
confusion_matrix
andclassification_report
. - Fit the model to the training data.
- Predict the labels of the test set, storing the results as
y_pred
. - Compute and print the confusion matrix and classification report for the test labels versus the predicted labels.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import confusion matrix
____
knn = KNeighborsClassifier(n_neighbors=6)
# Fit the model to the training data
____
# Predict the labels of the test data: y_pred
y_pred = ____
# Generate the confusion matrix and classification report
print(____(____, ____))
print(____(____, ____))