Spam and num_char
Is there an association between spam and the length of an email? You could imagine a story either way:
- Spam is more likely to be a short message tempting me to click on a link, or
- My normal email is likely shorter since I exchange brief emails with my friends all the time.
Here, you'll use the email
dataset to settle that question. Begin by bringing up the help file and learning about all the variables with ?email
.
As you explore the association between spam and the length of an email, use this opportunity to try out linking a dplyr
chain with the layers in a ggplot2
object.
Este ejercicio forma parte del curso
Análisis exploratorio de datos en R
Instrucciones de ejercicio
Using the email
dataset
- Load the packages
ggplot2
,dplyr
, andopenintro
. - Compute appropriate measures of the center and spread of
num_char
for both spam and not-spam usinggroup_by()
andsummarize()
. No need to name the new columns created bysummarize()
. - Construct side-by-side box plots to visualize the association between the same two variables. It will be useful to
mutate()
a new column containing a log-transformed version ofnum_char
.
Ejercicio interactivo práctico
Pruebe este ejercicio completando este código de muestra.
# Load packages
# Compute summary statistics
email %>%
___ %>%
___
# Create plot
email %>%
mutate(log_num_char = ___) %>%
ggplot(aes(x = ___, y = log_num_char)) +
___