CommencerCommencer gratuitement

Is the source or the destination bad?

In the previous lesson, you used the destination computer as your entity of interest. However, your cybersecurity analyst just told you that it is the infected machines that generate the bad traffic, and will therefore appear as a source, not a destination, in the flows dataset.

The data flows has been preloaded, as well as the list bad of infected IDs and the feature extractor featurizer() from the previous lesson. You also have numpy available as np, AdaBoostClassifier(), and cross_val_score().

Cet exercice fait partie du cours

Designing Machine Learning Workflows in Python

Afficher le cours

Instructions

  • Create a data frame where each row is a feature vector for a source_computer. Group by source computer ID in the flows dataset and apply the feature extractor to each group.
  • Convert the iterator to a data frame by calling list() on it.
  • Create labels by checking whether each source_computer ID belongs in the list of bads you have been given.
  • Assess an AdaBoostClassifier() on this data using cross_val_score().

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Group by source computer, and apply the feature extractor
out = flows.____('source_computer').____(featurize)

# Convert the iterator to a dataframe by calling list on it
X = pd.DataFrame(____, index=____)

# Check which sources in X.index are bad to create labels
y = [x in bads for x in ____]

# Report the average accuracy of Adaboost over 3-fold CV
print(np.mean(____(____, X, y)))
Modifier et exécuter le code