Get startedGet started for free

Accuracy after dimensionality reduction

You'll reduce the overfit with the help of dimensionality reduction. In this case, you'll apply a rather drastic form of dimensionality reduction by only selecting a single column that has some good information to distinguish between genders. You'll repeat the train-test split, model fit and prediction steps to compare the accuracy on test versus training data.

All relevant packages and y have been pre-loaded.

This exercise is part of the course

Dimensionality Reduction in Python

View Course

Exercise instructions

  • Select just the neck circumference ('neckcircumferencebase') column from ansur_df.
  • Split the data, instantiate a classifier and fit the data. This has been done for you.
  • Once again calculate the accuracy scores on both training and test set.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Assign just the 'neckcircumferencebase' column from ansur_df to X
X = ansur_df[[____]]

# Split the data, instantiate a classifier and fit the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
svc = SVC()
svc.fit(X_train, y_train)

# Calculate accuracy scores on both train and test data
accuracy_train = accuracy_score(____, svc.predict(____))
accuracy_test = accuracy_score(____, svc.predict(____))

print(f"{accuracy_test:.1%} accuracy on test set vs. {accuracy_train:.1%} on training set")
Edit and Run Code