Cleaning and counting
Remove stop words to explore the content of just the airline tweets classified as complaints in twitter_data.
Este ejercicio forma parte del curso
Introduction to Text Analysis in R
Instrucciones del ejercicio
- Tokenize the tweets in
twitter_data. Name the column with tokenized words asword. - Remove the default stop words from the tokenized
twitter_data. - Filter to keep the complaints only.
- Compute word counts using the tokenized, cleaned text and arrange in descending order by count.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
tidy_twitter <- twitter_data %>%
# Tokenize the twitter data
___(___, ___) %>%
# Remove stop words
anti_join(stop_words)
tidy_twitter %>%
# Filter to keep complaints only
___(___ == ___) %>%
# Compute word counts and arrange in descending order
___(___) %>%
___(___)