The question mark and its two meanings
The or operator is good if you know exactly what options are valid, and also if you're sure that one of the options is present. But what if you want to match a pattern where one part is sometimes present and sometimes isn't? This is where the question mark ?
comes in:
The ?
can make the preceding group or character optional. With it, a regular expression matches, even if a certain part of the pattern is missing. But be aware, if it follows a multiplier like *
or +
, the question mark can have a second effect:
The ?
can also make the preceding multiplier "lazy" instead of "greedy". This means that instead of regular expressions looking for the maximum number of characters, the ?
has the power to find the minimum number of text matches.
This is a part of the course
“Intermediate Regular Expressions in R”
Exercise instructions
- Match both the singular
"Screen"
as well as the plural"Screens"
by making the last"s"
optional. - Match a random amount of arbitrary characters in front of a comma by using
.*
. - Match the same pattern with a question mark
?
after the star - do you spot the difference?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Match both Screen and Screens by making the last "s" optional
str_match(lines, pattern = "Screens___")
# Match a random amount of arbitrary characters, followed by a comma
str_match(lines, pattern = "___,")
# Match the same pattern followed by a comma, but the "lazy" way
str_match(lines, pattern = "___,")