Extract names with context

Let's take out our dataset about Swiss politicians again. It consist of two variables: articles which is a collection of news articles about Swiss politics and politicians which is a vector with several names of Swiss politicians.

You already counted the number of occurrences per name, but wouldn't it be interesting if you could not only count the names but also see in what context the names are used? You could for example compare whether the contexts differ from female to male politicians. To do so, you'll have to extract the text surrounding our politician names.

As the text contains word characters \\w as well as punctuation [:punct:] like periods . or commas ,, you will have to create a pattern that matches both of these character types.

Use the vector politicians and collapse it to create an "or pattern" like you did in chapter 2.
Create a custom pattern in square brackets [] that matches both word characters as well as punctuations.
Using glue, add the newly created context both in front of as well as after the polit_pattern. The \\s? indicated that after there can be a space or no space after the politician names.

Regular Expressions: Writing Custom Patterns

Creating Strings with Data

Extracting Structured Data From Text

Similarities Between Strings

Ejercicio

Extract names with context

Instrucciones