Get startedGet started for free

Using the "or pattern" with a larger dataset

Now that you've understood the principle of concatenating multiple possibilities from a vector, you'll go one step further and apply this to a larger dataset. Available in the global scope are two variables: articles and politicians. The first is a collection of news articles about Swiss politics. The latter is a list of names of Swiss politicians that appear in the articles.

Now it's your job to find out which names appear in which of the articles and which politician appears how many times in all the articles.

This exercise is part of the course

Intermediate Regular Expressions in R

View Course

Exercise instructions

  • Use the vector politicians to create a regular expression that matches all the names that are stored in that vector.
  • Create a new column in the data frame articles which contains all politician names that appear in the column text.
  • Glue all articles together so you're able to count the number of occurrences per politician more easily.
  • Use the vector politicians as a pattern and pass it to str_count().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Construct a pattern that searches for all politicians
polit_pattern <- glue_collapse(___, sep = "___")

# Use the pattern to match all names in the column "text"
articles %<>%
  mutate(mentions = str_match_all(___, ___))

# Collapse all items of the column "text"
all_articles_in_one <- ___(articles$text)

# Pass the vector politicians to count all its elements
str_count(all_articles_in_one, ___)
Edit and Run Code