Match all capturing groups
In this exercise, you will work with a text file named top_10
which stores movie names and their rank. In this multi-line text, \\n
is used to start a new line. You will use the str_split()
function to split the text file into multiple lines.
The newly created one-rowed matrix top_10_lines
then contains ten lines with the same pattern: The rank of the movie, followed by a dot and a space and the movie title itself. The function str_match()
and two capturing groups ()
will make it possible to extract these two pieces of information from plain text into a tabular form.
This exercise is part of the course
Intermediate Regular Expressions in R
Exercise instructions
- Use the
str_split()
function to split the text into its lines, outputting a character matrix by enablingsimplify
. - Familiarize yourself with the structure of a line. It contains the rank and the title of a movie.
- Extract the rank and the title of a movie by using capturing groups in the
str_match()
function.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split the input by line break and enable simplify
top_10_lines <- str_split(
top_10,
pattern = "___",
simplify = ___
)
# Inspect the first three lines and analyze their form
___[1:3]
# Add to the pattern two capturing groups that match rank and title
str_match(
top_10_lines,
pattern = "___\\. ___"
)