Bayesian spam filter

Well done on the previous exercise! Let's now tackle the famous Bayes' Theorem and use it for a simple but important task: spam detection.

While browsing your inbox, you have figured out that quite a few of the emails you would rather not waste your time on reading contain exclamatory statements, such as "BUY NOW!!!". You start thinking that the presence of three exclamation marks next to each other might be a good spam predictor! Hence you've prepared a DataFrame called emails with two variables: spam, whether the email was spam, and contains_3_exlc, whether it contains the string "!!!". The head of the data looks like this:

     spam    contains_3_excl
0    False             False
1    False             False
2    True              False
3    False             False
4    False             False

Your job is to calculate the probability of the email being spam given that it contains three exclamation marks. Let's tackle it step by step! Here is Bayes' formula for your reference:

$$P(A|B) = \frac{P(B|A) * P(A)}{P(B)}$$

Deze oefening maakt deel uit van de cursus

Bayesian Data Analysis in Python

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Calculate and print the unconditional probability of spam
p_spam = ____[____].____
print(____)

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Bayesian Data Analysis in Python

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

Take your first steps in the Bayesian world. In this chapter, you’ll be introduced to the basic concepts of probability and statistical distributions, as well as to the famous Bayes' Theorem, the cornerstone of Bayesian methods. Finally, you’ll build your first Bayesian model to draw conclusions from randomized coin tosses.

Exercise 1: Who is Bayes? What is Bayes?Exercise 2: Bayesians vs. Frequentists Exercise 3: Probability distributions Exercise 4: Probability and Bayes' Theorem Exercise 5: Let's play cards Exercise 6: Bayesian spam filter

Huidige oefening

Exercise 7: What does the test say?Exercise 8: Tasting the Bayes Exercise 9: Tossing a coin Exercise 10: The more you toss, the more you learn Exercise 11: Hey, is this coin fair?

It’s time to look under the Bayesian hood. You’ll learn how to apply Bayes' Theorem to drug-effectiveness data to estimate the parameters of probability distributions using the grid approximation technique, and update these estimates as new data become available. Next, you’ll learn how to incorporate prior knowledge into the model before finally practicing the important skill of reporting results to a non-technical audience.

Exercise 1: Under the Bayesian hood Exercise 2: Towards grid approximation Exercise 3: Grid approximation without prior knowledge Exercise 4: Updating posterior belief Exercise 5: Prior belief Exercise 6: The truth of the prior Exercise 7: Picking the right prior Exercise 8: Simulating posterior draws Exercise 9: Reporting Bayesian results Exercise 10: Point estimates Exercise 11: Highest Posterior Density credible intervals Exercise 12: The meaning of credibility

Apply your newly acquired Bayesian data analysis skills to solve real-world business challenges. You’ll work with online sales marketing data to conduct A/B tests, decision analysis, and forecasting with linear regression models.

Exercise 1: A/B testing Exercise 2: Simulate beta posterior Exercise 3: Posterior click rates Exercise 4: A or B, and how sure are we?Exercise 5: How bad can it be?Exercise 6: Decision analysis Exercise 7: Decision analysis: cost Exercise 8: Decision analysis: profit Exercise 9: Regression and forecasting Exercise 10: Defining a Bayesian regression model Exercise 11: Analyzing regression parameters Exercise 12: Predictive distribution

In this final chapter, you’ll take advantage of the powerful PyMC3 package to easily fit Bayesian regression models, conduct sanity checks on a model's convergence, select between competing models, and generate predictions for new data. To wrap up, you’ll apply what you’ve learned to find the optimal price for avocados in a Bayesian data analysis case study. Good luck!

Exercise 1: Markov Chain Monte Carlo and model fitting Exercise 2: Markov Chain Monte Carlo Exercise 3: Sampling posterior draws Exercise 4: Interpreting results and comparing models Exercise 5: Inspecting posterior draws Exercise 6: Comparing models with WAIC Exercise 7: Making predictions Exercise 8: Sample from predictive density Exercise 9: Estimating test error Exercise 10: How much is an avocado?Exercise 11: Fitting the model Exercise 12: Inspecting the model Exercise 13: Optimizing the price Exercise 14: Final remarks