Aan de slagGa gratis aan de slag

Bayesian spam filter

Well done on the previous exercise! Let's now tackle the famous Bayes' Theorem and use it for a simple but important task: spam detection.

While browsing your inbox, you have figured out that quite a few of the emails you would rather not waste your time on reading contain exclamatory statements, such as "BUY NOW!!!". You start thinking that the presence of three exclamation marks next to each other might be a good spam predictor! Hence you've prepared a DataFrame called emails with two variables: spam, whether the email was spam, and contains_3_exlc, whether it contains the string "!!!". The head of the data looks like this:

     spam    contains_3_excl
0    False             False
1    False             False
2    True              False
3    False             False
4    False             False

Your job is to calculate the probability of the email being spam given that it contains three exclamation marks. Let's tackle it step by step! Here is Bayes' formula for your reference:

$$P(A|B) = \frac{P(B|A) * P(A)}{P(B)}$$

Deze oefening maakt deel uit van de cursus

Bayesian Data Analysis in Python

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Calculate and print the unconditional probability of spam
p_spam = ____[____].____
print(____)
Code bewerken en uitvoeren