ComeçarComece de graça

Is the source or the destination bad?

In the previous lesson, you used the destination computer as your entity of interest. However, your cybersecurity analyst just told you that it is the infected machines that generate the bad traffic, and will therefore appear as a source, not a destination, in the flows dataset.

The data flows has been preloaded, as well as the list bad of infected IDs and the feature extractor featurizer() from the previous lesson. You also have numpy available as np, AdaBoostClassifier(), and cross_val_score().

Este exercício faz parte do curso

Designing Machine Learning Workflows in Python

Ver curso

Instruções do exercício

  • Create a data frame where each row is a feature vector for a source_computer. Group by source computer ID in the flows dataset and apply the feature extractor to each group.
  • Convert the iterator to a data frame by calling list() on it.
  • Create labels by checking whether each source_computer ID belongs in the list of bads you have been given.
  • Assess an AdaBoostClassifier() on this data using cross_val_score().

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Group by source computer, and apply the feature extractor
out = flows.____('source_computer').____(featurize)

# Convert the iterator to a dataframe by calling list on it
X = pd.DataFrame(____, index=____)

# Check which sources in X.index are bad to create labels
y = [x in bads for x in ____]

# Report the average accuracy of Adaboost over 3-fold CV
print(np.mean(____(____, X, y)))
Editar e executar o código