Get startedGet started for free

Match all capturing groups

In this exercise, you will work with a text file named top_10 which stores movie names and their rank. In this multi-line text, \\n is used to start a new line. You will use the str_split() function to split the text file into multiple lines.

The newly created one-rowed matrix top_10_lines then contains ten lines with the same pattern: The rank of the movie, followed by a dot and a space and the movie title itself. The function str_match() and two capturing groups () will make it possible to extract these two pieces of information from plain text into a tabular form.

This exercise is part of the course

Intermediate Regular Expressions in R

View Course

Exercise instructions

  • Use the str_split() function to split the text into its lines, outputting a character matrix by enabling simplify.
  • Familiarize yourself with the structure of a line. It contains the rank and the title of a movie.
  • Extract the rank and the title of a movie by using capturing groups in the str_match() function.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Split the input by line break and enable simplify
top_10_lines <- str_split(
  top_10,
  pattern = "___",
  simplify = ___
)

# Inspect the first three lines and analyze their form
___[1:3]

# Add to the pattern two capturing groups that match rank and title
str_match(
  top_10_lines,
  pattern = "___\\. ___"
)
Edit and Run Code