MulaiMulai sekarang secara gratis

Spam and !!!

Let's look at a more obvious indicator of spam: exclamation marks. exclaim_mess contains the number of exclamation marks in each message. Using summary statistics and visualization, see if there is a relationship between this variable and whether or not a message is spam.

Experiment with different types of plots until you find one that is the most informative. Recall that you've seen:

  • Side-by-side box plots
  • Faceted histograms
  • Overlaid density plots

Latihan ini adalah bagian dari kursus

Exploratory Data Analysis in R

Lihat Kursus

Petunjuk latihan

The email dataset is still available in your workspace.

  • Calculate appropriate measures of the center and spread of exclaim_mess for both spam and not-spam using group_by() and summarize().
  • Construct an appropriate plot to visualize the association between the same two variables, adding in a log-transformation step if necessary.
  • If you decide to use a log transformation, remember that log(0) is -Inf in R, which isn't a very useful value! You can get around this by adding a small number (like 0.01) to the quantity inside the log() function. This way, your value is never zero. This small shift to the right won't affect your results.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Compute center and spread for exclaim_mess by spam




# Create plot for spam and exclaim_mess

Edit dan Jalankan Kode