Exercise

Combining heuristics

A different cyber analyst tells you that during certain types of attack, the infected source computer sends small bits of traffic, to avoid detection. This makes you wonder whether it would be better to create a combined heuristic that simultaneously looks for large numbers of ports and small packet sizes. Does this improve performance over the simple port heuristic? As with the last exercise, you have X_train, X_test, y_train and y_test in memory. The sample code also helps you reproduce the outcome of the port heuristic, pred_port. You also have numpy as np and accuracy_score() preloaded.

Instructions

100 XP
  • The column average_packet computes the average packet size over all flows observed from a single source. Take the mean of those values for bad sources only on the training set.
  • Now construct a new rule which flags as positive all sources whose average traffic is less than the value above.
  • Combine the rules so that both heuristics have to simultaneously apply, using an appropriate arithmetic operation.
  • Report the accuracy of the combined heuristic.