IniziaInizia gratis

Contamination revisited

You notice that one-class SVM does not have a contamination parameter. But you know well by now that you really need a way to control the proportion of examples that are labeled as novelties in order to control your false positive rate. So you decide to experiment with thresholding the scores. The detector has been imported as onesvm, you also have available the data as X_train, X_test, y_train, y_test, numpy as np, and confusion_matrix().

Questo esercizio fa parte del corso

Designing Machine Learning Workflows in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Fit the 1-class SVM and score the test data.
  • Compute the observed proportion of outliers in the test data.
  • Use np.quantile() to find where to threshold the scores to achieve that proportion.
  • Use that threshold to label the test data. Print the confusion matrix.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Fit a one-class SVM detector and score the test data
nov_det = ____(X_train)
scores = ____(X_test)

# Find the observed proportion of outliers in the test data
prop = np.____(y_test==____)

# Compute the appropriate threshold
threshold = np.____(____, ____)

# Print the confusion matrix for the thresholded scores
print(confusion_matrix(y_test, ____ > ____))
Modifica ed esegui il codice