LoslegenKostenlos loslegen

Match all capturing groups

In this exercise, you will work with a text file named top_10 which stores movie names and their rank. In this multi-line text, \\n is used to start a new line. You will use the str_split() function to split the text file into multiple lines.

The newly created one-rowed matrix top_10_lines then contains ten lines with the same pattern: The rank of the movie, followed by a dot and a space and the movie title itself. The function str_match() and two capturing groups () will make it possible to extract these two pieces of information from plain text into a tabular form.

Diese Übung ist Teil des Kurses

Intermediate Regular Expressions in R

Kurs anzeigen

Anleitung zur Übung

  • Use the str_split() function to split the text into its lines, outputting a character matrix by enabling simplify.
  • Familiarize yourself with the structure of a line. It contains the rank and the title of a movie.
  • Extract the rank and the title of a movie by using capturing groups in the str_match() function.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Split the input by line break and enable simplify
top_10_lines <- str_split(
  top_10,
  pattern = "___",
  simplify = ___
)

# Inspect the first three lines and analyze their form
___[1:3]

# Add to the pattern two capturing groups that match rank and title
str_match(
  top_10_lines,
  pattern = "___\\. ___"
)
Code bearbeiten und ausführen