Get startedGet started for free

Contamination revisited

You notice that one-class SVM does not have a contamination parameter. But you know well by now that you really need a way to control the proportion of examples that are labeled as novelties in order to control your false positive rate. So you decide to experiment with thresholding the scores. The detector has been imported as onesvm, you also have available the data as X_train, X_test, y_train, y_test, numpy as np, and confusion_matrix().

This exercise is part of the course

Designing Machine Learning Workflows in Python

View Course

Exercise instructions

  • Fit the 1-class SVM and score the test data.
  • Compute the observed proportion of outliers in the test data.
  • Use np.quantile() to find where to threshold the scores to achieve that proportion.
  • Use that threshold to label the test data. Print the confusion matrix.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit a one-class SVM detector and score the test data
nov_det = ____(X_train)
scores = ____(X_test)

# Find the observed proportion of outliers in the test data
prop = np.____(y_test==____)

# Compute the appropriate threshold
threshold = np.____(____, ____)

# Print the confusion matrix for the thresholded scores
print(confusion_matrix(y_test, ____ > ____))
Edit and Run Code