Using the "or pattern" with a larger dataset
Now that you've understood the principle of concatenating multiple possibilities from a vector, you'll go one step further and apply this to a larger dataset. Available in the global scope are two variables: articles
and politicians
. The first is a collection of news articles about Swiss politics. The latter is a list of names of Swiss politicians that appear in the articles.
Now it's your job to find out which names appear in which of the articles and which politician appears how many times in all the articles.
This exercise is part of the course
Intermediate Regular Expressions in R
Exercise instructions
- Use the vector
politicians
to create a regular expression that matches all the names that are stored in that vector. - Create a new column in the data frame
articles
which contains all politician names that appear in the columntext
. - Glue all articles together so you're able to count the number of occurrences per politician more easily.
- Use the vector
politicians
as a pattern and pass it tostr_count()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Construct a pattern that searches for all politicians
polit_pattern <- glue_collapse(___, sep = "___")
# Use the pattern to match all names in the column "text"
articles %<>%
mutate(mentions = str_match_all(___, ___))
# Collapse all items of the column "text"
all_articles_in_one <- ___(articles$text)
# Pass the vector politicians to count all its elements
str_count(all_articles_in_one, ___)