Get startedGet started for free

Extract names with context

Let's take out our dataset about Swiss politicians again. It consist of two variables: articles which is a collection of news articles about Swiss politics and politicians which is a vector with several names of Swiss politicians.

You already counted the number of occurrences per name, but wouldn't it be interesting if you could not only count the names but also see in what context the names are used? You could for example compare whether the contexts differ from female to male politicians. To do so, you'll have to extract the text surrounding our politician names.

As the text contains word characters \\w as well as punctuation [:punct:] like periods . or commas ,, you will have to create a pattern that matches both of these character types.

This exercise is part of the course

Intermediate Regular Expressions in R

View Course

Exercise instructions

  • Use the vector politicians and collapse it to create an "or pattern" like you did in chapter 2.
  • Create a custom pattern in square brackets [] that matches both word characters as well as punctuations.
  • Using glue, add the newly created context both in front of as well as after the polit_pattern. The \\s? indicated that after there can be a space or no space after the politician names.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create our polit_pattern again by collapsing "politicians"
polit_pattern <- glue_collapse(___, sep = "|")

# Match one or more word characters or punctuations
context <- "([___[___]]+\\s){0,10}"

# Add this pattern in front and after the polit_pattern
polit_pattern_with_context <- glue(
  "{___}({polit_pattern})\\s?{___}"
)

str_extract_all(
  articles$text,
  pattern = polit_pattern_with_context
)
Edit and Run Code