1. Alternation and non-capturing groups
In this video, we'll talk about
other ways in which grouping characters can help us.
2. Pipe
You've learned in previous videos about the vertical bar or pipe operator.
Suppose we have the following string.
And we want to find all matches for pet names. So we can use the pipe operator to specify that we want to match cat or dog or bird as you see in the code.
This will output the following list.
3. Pipe
Now,
we changed the string a little bit.
And once more we want to find all the pet names. But this time only those that come after a number and a whitespace. So we specify this again with the pipe operator.
Hmm we got the wrong output. Why?
The pipe operator works comparing everything that is to its left (digit whitespace cat) with everything to the right, dog.
4. Alternation
In order to solve this,
we can use alternation. In simpler terms, we can use parentheses again to group the optional characters as you can see in the slide.
In the code, now the parentheses are added to group cat or dog or bird.
This time we get the output cat and dog. This is the correct match as only these two patterns followed a number and whitespace.
5. Alternation
In the previous example, we may also want to match the number.
In that case, we need to place parentheses to capture the digit group as seen in the slide.
In the code, we now use two pair of parentheses. We use findall in the string.
And we get a list with two tuples as shown in the output.
6. Non-capturing groups
Sometimes, we need to group characters using parentheses.
But we are not going to reference back to this group.
For these cases, there are a special type of groups called non-capturing groups. For using them, we just need to add question mark colon inside the parenthesis but before the regex.
7. Non-capturing groups
We
have the following string. We want to find all matches of numbers. We see that the pattern consists of two numbers and dash repeated three times. After that, three numbers, dash, four numbers. We want to extract only the last part without the first repeated elements. We need to group the first two elements to indicate repetitions. But we don't want to capture them.
So we use non-capturing groups to group backslash d repeated two times and dash. Then we indicate this group should be repeated three times. Then, we group backslash d repeated three times, dash, backslash d repeated three times as shown in the slide.
In the code, we then match the regex to the string.
And we get the numbers we were looking for as shown in the output.
8. Alternation
Finally, we can combine non-capturing groups and alternation together. Remember that alternation implies using parentheses and the pipe operand to group optional characters.
Let's suppose that we have the following string. And we want to match all the numbers of the day. We know that they are followed by th or rd. But we only want to capture the number and not the letters that follow.
We write our regex. We capture inside parentheses backslash d repeated once or more times. Then, we can use a non-capturing group. Inside we use the pipe operator to choose between th and rd as shown in the code.
We find all the matches in the string. And we get the correct output.
9. Let's practice!
It's time to practice alternation and non-capturing groups!