Match repetitions
Alright, in this exercise your patterns will get much more powerful. You now know how to use repetitions to match exactly the desired number of digits or letters.
By using a number in curly braces {}
you can define how many occurrences you want to search for. With one number e.g. {2}
, you'll match that exact number of repetitions. With a number and a comma, the number serves as a minimum: {2,}
(two repetitions or more). The second number is a maximum, so {2,4}
is between 2 and 4 repetitions.
The plus sign +
and the asterisk *
are an even quicker way to define repetition: The first will match one or more occurrences and the latter will match zero, one or more. These two are often used in combination with the period .
to match an unknown number of arbitrary characters.
This exercise is part of the course
Intermediate Regular Expressions in R
Exercise instructions
- Find all titles that contain a number with two or more digits.
- Match the first word of every title by searching one or more word characters at the beginning of the string.
- Match the word
"Knight"
and everything that comes before it.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# This lists all movies with two or more digits in a row
movie_titles[str_detect(
movie_titles,
pattern = "\\d{2,}"
)]
# List just the first words of every movie title
str_match(movie_titles, pattern = "___")
# Match everything that comes before "Knight"
str_match(movie_titles, pattern = "___Knight")