ComenzarEmpieza gratis

Bringing it all together

In addition to the distance-based learning anomaly detection pipeline you created in the last exercise, you want to also support a feature-based learning one with one-class SVM. You decide to extract two features: first, the length of the string, and, second, a numerical encoding of the first letter of the string, obtained using the function LabelEncoder() described in Chapter 1. To ensure a fair comparison, you will input the outlier scores into an AUC calculation. The following have been imported: LabelEncoder(), roc_auc_score() as auc() and OneClassSVM. The data is available as a pandas data frame called proteins with two columns, label and seq, and two classes, IMMUNE SYSTEM and VIRUS. A fitted LoF detector is available as lof_detector.

Este ejercicio forma parte del curso

Designing Machine Learning Workflows in Python

Ver curso

Instrucciones del ejercicio

  • For a string s, len(s) returns its length. Apply that to the seq column to obtain a new column len.
  • For a string s, list(s) returns a list of its characters. Use this to extract the first letter of each sequence, and encode it using LabelEncoder().
  • LoF scores are in the negative_outlier_factor_ attribute. Compute their AUC.
  • Fit a 1-class SVM to a data frame with only len and first as columns. Extract the scores and assess both the LoF scores and the SVM scores using AUC.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Create a feature that contains the length of the string
proteins['len'] = proteins['seq'].apply(____)

# Create a feature encoding the first letter of the string
proteins['first'] =  ____.____(
  proteins['seq'].apply(____))

# Extract scores from the fitted LoF object, compute its AUC
scores_lof = lof_detector.____
print(____(proteins['label']==____, scores_lof))

# Fit a 1-class SVM, extract its scores, and compute its AUC
svm = ____.____(proteins[['len', 'first']])
scores_svm = svm.____(proteins[['len', 'first']])
print(____(proteins['label']==____, scores_svm))
Editar y ejecutar código