Get startedGet started for free

Building a diabetes classifier

You'll be using the Pima Indians diabetes dataset to predict whether a person has diabetes using logistic regression. There are 8 features and one target in this dataset. The data has been split into a training and test set and pre-loaded for you as X_train, y_train, X_test, and y_test.

A StandardScaler() instance has been predefined as scaler and a LogisticRegression() one as lr.

This exercise is part of the course

Dimensionality Reduction in Python

View Course

Exercise instructions

  • Fit the scaler on the training features and transform these features in one go.
  • Fit the logistic regression model on the scaled training data.
  • Scale the test features.
  • Predict diabetes presence on the scaled test set.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit the scaler on the training features and transform these in one go
X_train_std = scaler.____(____)

# Fit the logistic regression model on the scaled training data
lr.____(____, ____)

# Scale the test features
X_test_std = scaler.____(____)

# Predict diabetes presence on the scaled test set
y_pred = lr.____(____)

# Prints accuracy metrics and feature coefficients
print(f"{accuracy_score(y_test, y_pred):.1%} accuracy on test set.")
print(dict(zip(X.columns, abs(lr.coef_[0]).round(2))))
Edit and Run Code