1. Greedy vs. non-greedy matching
You have already worked with repetitions.
In this video, we'll deepen our understanding of how the quantifiers work.
2. Greedy vs. non-greedy matching
There are two types of matching methods: greedy and non-greedy (also called lazy) operators.
The quantifiers that you have been learning until now (which are called standard quantifiers) are greedy by default.
3. Greedy matching
We said that the standard quantifiers have a
greedy behavior, meaning that they will attempt to match as many characters as possible.
And in doing so, they will return the longest match found with a match attempt.
Let's take a look at this code. We want to find a pattern that has one or more digits on the string displayed and our greedy quantifier
will return the pattern '12345'.
We can explain this in the following way: our quantifier will start by matching the first digit found, '1'. Because it is greedy, it will keep going to find 'more' digits and stop only when no other digit can be matched, returning '12345'.
4. Greedy matching
However, there is another characteristic that we should explore.
If the greedy quantifier has matched so many characters that can not match the rest of pattern, it will backtrack, giving up characters matched earlier one at a time and try again.
Backtracking is like driving a car without a map. If you drive through a path and hit a dead end street, you need to backtrack along your road to an earlier point to take another street.
To make this more clear, we'll take this example code. We use the greedy quantifier .* to find anything, zero or more times, followed by the letters "h" "e" "l" "l" "o".
We can see here that it returns the pattern 'xhello'.
So our greedy quantifier will start by matching as much as possible, the entire string. Then it tries to match the h, but there are no characters left. So it backtracks, giving up one matched character. Trying again. It still doesn't match the h, so it backtracks one more step repeatedly till it finally matches the h in the regex, and the rest of the characters.
5. Non-greedy matching
Because
they have lazy behavior, non-greedy quantifiers will attempt to match as few characters as needed
returning the shortest match.
So how do we obtain non-greedy quantifiers? We can append a question mark at the end of the greedy quantifiers to convert them into lazy.
If we take the same code as before, our non-greedy quantifier
will return the pattern '1'.
In this case, our quantifier will start by matching the first digit found, '1'. Because it is non-greedy, it will stop there as we stated that we want 'one or more' and one is as few as needed.
6. Non-greedy matching
Non-greedy quantifiers
also backtrack. In this case, if they have matched so few characters that the rest of the pattern cannot match, they backtrack, expand the matched character one at a time and try again.
Let's take the same example code again. This time we will use the lazy quantifier .*?. Interestingly,
we obtain the same match 'xhello'.
But how this match was obtained is different from the first time. The lazy quantifier first matches as little as possible, nothing, leaving the entire string unmatched. Then it tries to match the h, but it doesn't work. So it backtracks, matching one more character, the x. Then it tries again, this time matching the h, and afterwards, the rest of the regex.
7. Let's practice!
Now that you have a good idea about how greedy and lazy quantifiers work, it's your turn to put this into practice.