Accuracy after dimensionality reduction
You'll reduce the overfit with the help of dimensionality reduction. In this case, you'll apply a rather drastic form of dimensionality reduction by only selecting a single column that has some good information to distinguish between genders. You'll repeat the train-test split, model fit and prediction steps to compare the accuracy on test versus training data.
All relevant packages and y
have been pre-loaded.
This exercise is part of the course
Dimensionality Reduction in Python
Exercise instructions
- Select just the neck circumference (
'neckcircumferencebase'
) column fromansur_df
. - Split the data, instantiate a classifier and fit the data. This has been done for you.
- Once again calculate the accuracy scores on both training and test set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Assign just the 'neckcircumferencebase' column from ansur_df to X
X = ansur_df[[____]]
# Split the data, instantiate a classifier and fit the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
svc = SVC()
svc.fit(X_train, y_train)
# Calculate accuracy scores on both train and test data
accuracy_train = accuracy_score(____, svc.predict(____))
accuracy_test = accuracy_score(____, svc.predict(____))
print(f"{accuracy_test:.1%} accuracy on test set vs. {accuracy_train:.1%} on training set")