Capturing
1. Capturing
In this chapter you'll learn a few advanced regular expression tricks that allow you to capture and refer to parts of a regular expression.2. Capturing
We'll start with capturing. Capturing is simply a way to group parts of pattern together. You'll see later that you can refer to captured pieces, later in the same pattern, or in replacement strings. For now we'll concentrate on extracting the captured pieces. In rebus you wrap the part of the pattern you wish to capture with the capture function. Compare the regular expression: ANY_CHAR followed by an "a",To the regular expression: capture(ANY_CHAR) followed by an "a". Notice how adding the capture around "ANY_CHAR" just encloses it in parentheses. Enclosing part of a pattern in parentheses is how you specify a captured group in a regex. You can see we get the same match whether or not we capture ANY_CHAR. Capturing doesn't change the pattern that is matched it simply indicates you want to do something with a piece of the pattern. The stringr function3. str_match()
str_match is specifically designed to work with patterns that include captures. str_match will return a matrix, where each row corresponds to an input string. The first column will be the entire match, the same as you'd get from str_extract. Then, there is a column for each captured group, with just the piece that matched the captured part of the pattern. The piece of the string that matched the ANY_CHAR was "F" in the first string and "c" in the second. This can be really useful when you've built up a complicated pattern but want to access different pieces of it. Take a look at4. str_match()
this pattern, it matches a dollar sign followed by a number, an optional number, then a dot and two more numbers. It matches dollar amounts under $100. You might for example want to pull out the dollars from the cents.5. str_match()
To do so you simply capture the part of the pattern that captures that dollar amount, and the piece that captures the cents part: In combination with str_match(), you get the dollars in one column and cents in another. Captures are counted from left to right, so the first capture will be in the first column after the full match, the second capture in the second column and so on. Do you remember the funny looking regex that came out of or()?6. Non-capturing groups
The parentheses here are followed by ?:. This indicates that this is a non-capturing group. You need grouping to7. Non-capturing groups
to distinguish whether you mean match d-o-g or c-a-t or you mean match d-o followed by g or c, followed by a-t. but rebus assumes you don't need to capture which of the alternatives match. rebus functions that use a non-capturing group by default, have a capture argument that you can set to TRUE if you do want to capture it Or you can simply wrap it with the function capture I prefer the second approach, because you can easily see what you are capturing just by looking for the capture function in your code. You'll need to be able to do this, to use backreferences, which I'll talk about in the next video.8. Let's practice!
You'll use capturing in the next few exercises to improve some of the pattern extraction tasks you've already completed.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.