IniziaInizia gratis

Spam and !!!

Let's look at a more obvious indicator of spam: exclamation marks. exclaim_mess contains the number of exclamation marks in each message. Using summary statistics and visualization, see if there is a relationship between this variable and whether or not a message is spam.

Experiment with different types of plots until you find one that is the most informative. Recall that you've seen:

  • Side-by-side box plots
  • Faceted histograms
  • Overlaid density plots

Questo esercizio fa parte del corso

Exploratory Data Analysis in R

Visualizza il corso

Istruzioni dell'esercizio

The email dataset is still available in your workspace.

  • Calculate appropriate measures of the center and spread of exclaim_mess for both spam and not-spam using group_by() and summarize().
  • Construct an appropriate plot to visualize the association between the same two variables, adding in a log-transformation step if necessary.
  • If you decide to use a log transformation, remember that log(0) is -Inf in R, which isn't a very useful value! You can get around this by adding a small number (like 0.01) to the quantity inside the log() function. This way, your value is never zero. This small shift to the right won't affect your results.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Compute center and spread for exclaim_mess by spam




# Create plot for spam and exclaim_mess

Modifica ed esegui il codice