Building a diabetes classifier
You'll be using the Pima Indians diabetes dataset to predict whether a person has diabetes using logistic regression. There are 8 features and one target in this dataset. The data has been split into a training and test set and pre-loaded for you as X_train
, y_train
, X_test
, and y_test
.
A StandardScaler()
instance has been predefined as scaler
and a LogisticRegression()
one as lr
.
This exercise is part of the course
Dimensionality Reduction in Python
Exercise instructions
- Fit the scaler on the training features and transform these features in one go.
- Fit the logistic regression model on the scaled training data.
- Scale the test features.
- Predict diabetes presence on the scaled test set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit the scaler on the training features and transform these in one go
X_train_std = scaler.____(____)
# Fit the logistic regression model on the scaled training data
lr.____(____, ____)
# Scale the test features
X_test_std = scaler.____(____)
# Predict diabetes presence on the scaled test set
y_pred = lr.____(____)
# Prints accuracy metrics and feature coefficients
print(f"{accuracy_score(y_test, y_pred):.1%} accuracy on test set.")
print(dict(zip(X.columns, abs(lr.coef_[0]).round(2))))