ComenzarEmpieza gratis

Is the source or the destination bad?

In the previous lesson, you used the destination computer as your entity of interest. However, your cybersecurity analyst just told you that it is the infected machines that generate the bad traffic, and will therefore appear as a source, not a destination, in the flows dataset.

The data flows has been preloaded, as well as the list bad of infected IDs and the feature extractor featurizer() from the previous lesson. You also have numpy available as np, AdaBoostClassifier(), and cross_val_score().

Este ejercicio forma parte del curso

Designing Machine Learning Workflows in Python

Ver curso

Instrucciones del ejercicio

  • Create a data frame where each row is a feature vector for a source_computer. Group by source computer ID in the flows dataset and apply the feature extractor to each group.
  • Convert the iterator to a data frame by calling list() on it.
  • Create labels by checking whether each source_computer ID belongs in the list of bads you have been given.
  • Assess an AdaBoostClassifier() on this data using cross_val_score().

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Group by source computer, and apply the feature extractor
out = flows.____('source_computer').____(featurize)

# Convert the iterator to a dataframe by calling list on it
X = pd.DataFrame(____, index=____)

# Check which sources in X.index are bad to create labels
y = [x in bads for x in ____]

# Report the average accuracy of Adaboost over 3-fold CV
print(np.mean(____(____, X, y)))
Editar y ejecutar código