Grouped summaries
So there are more non-complaints than complaints in twitter_data
. You might be starting to question whether or not this data is actually from Twitter! There are a few other columns of interest in twitter_data
that would be helpful to explore before you get to the tweets themselves. Every tweet includes the number of followers that user has in the usr_followers_count
column. Do you expect those who complain to have more users or fewer users, on average, than those who don't complain? You can use grouped summaries to quickly and easily provide an answer.
Cet exercice fait partie du cours
Introduction to Text Analysis in R
Instructions
- Group the data by
complaint_label
. - Compute the average, minimum, and maximum number of
usr_followers_count
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Start with the data frame
___ %>%
# Group the data by whether or not the tweet is a complaint
___(___) %>%
# Compute the mean, min, and max follower counts
summarize(
avg_followers = ___(___),
min_followers = ___(___),
max_followers = ___(___)
)