Get startedGet started for free

Capturing groups

1. Capturing groups

Well done, you are now able to write very complex patterns that exactly match your needs. So far, all the patterns that you have written returned a substring of a large text, but it always returned it as a whole. From the first letter that matched, to the very last of the pattern.

2. A regular pattern

For example, let's remember that pattern from the last chapter. We wanted to match a username and two numbers, the number of attempted logins and the number of successful logins. With our pattern the result of "string match" looked as follows: a one by one matrix, so basically one simple substring. Of course, we could now take this substring and split it apart until we end up with the parts of the information that we are looking for, but that could be tedious and complicated.

3. Meet capturing groups

Regular expressions offer us a way that is both quicker and more precise: Capturing groups. Defined by opening and closing parentheses, capturing groups can help us distinguish between the parts that we are interested in and those in which we aren't. Let's wrap the three parts of our pattern in parentheses and see how it changes the result of our function call: The first group is the username, it consists of one or more alphabetical letters. The second and third part we care about are the attempted and successful logins, which in our pattern are represented as one or more digits. We wrap those as well. When we call the "string match" function with our pattern, we now get not a one by one matrix, but a one by four! The first is still the complete match, from start to finish, but the others contain only the parts of the pattern that we wrapped with parentheses - that was the capturing groups at work!

4. Replacement

While it is possible to run "string match" on a string and get a matrix with separated values back, there is also a second way to work with the captured groups: With "string replace". Let's imagine we write a small piece of software that takes in a string, modifies the payload and passes the result on to the next step in the process. Let's call "string replace" with the very same input string and pattern again. But this time we add a third argument: replacement. In it, we define the new content that we want to see in place of the pattern, that we passed as a second argument. In this replacement, we can now write custom text, but not only: We also work with the contents of the capturing groups. When we want to insert what was captured by the first group for example, we can insert the number one, preceded by two backslashes. This tells the replace function that it should put the contents of the first capturing group at that position.

5. String split

In the next exercise we will use another function from the "stringr" package to split a text into separate parts - into a vector. We will use the function "str split" for this. The first argument is our input text, the second one is our pattern that is used as a delimiter, it defines where the input text should be split apart. The last argument is simplify. The default is FALSE, this makes "str split" return a list of character vectors. When we set it to "true" the function returns a character matrix.

6. Let's practice!

Alright, these were the last special characters that we are using in this course, I promise. Let's practice working with capturing groups!