Match all capturing groups
In this exercise, you will work with a text file named top_10 which stores movie names and their rank. In this multi-line text, \\n is used to start a new line. You will use the str_split() function to split the text file into multiple lines.
The newly created one-rowed matrix top_10_lines then contains ten lines with the same pattern: The rank of the movie, followed by a dot and a space and the movie title itself. The function str_match() and two capturing groups () will make it possible to extract these two pieces of information from plain text into a tabular form.
Este exercício faz parte do curso
Intermediate Regular Expressions in R
Instruções do exercício
- Use the
str_split()function to split the text into its lines, outputting a character matrix by enablingsimplify. - Familiarize yourself with the structure of a line. It contains the rank and the title of a movie.
- Extract the rank and the title of a movie by using capturing groups in the
str_match()function.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Split the input by line break and enable simplify
top_10_lines <- str_split(
top_10,
pattern = "___",
simplify = ___
)
# Inspect the first three lines and analyze their form
___[1:3]
# Add to the pattern two capturing groups that match rank and title
str_match(
top_10_lines,
pattern = "___\\. ___"
)