Get startedGet started for free

The pipe and the question mark

1. The pipe and the question mark

Well done, we're almost done with chapter one. You already learned what character classes are and how to describe them. In this lesson you will meet the last two special characters used in regular expressions: The so called pipe operator that will serve as an "or" condition and the question mark that can make things optional.

2. This or that

So far we've only worked with a list of plain movie titles, but later on we will see how we can extract multiple pieces of information from a single line of text. Our new vector "lines" contains information about the number of screens a movie was shown on as well as which company distributed it. Let's say we're writing an article on movie distributors and the number and type of movies they produced. We might be interested in two distributors in particular: Columbia and Pixar. Can we create a regular expression that will match all lines with these two distributors? We sure can. We can use the pipe operator as an "or" condition. So by writing Columbia "pipe" Pixar, we create a search that will match both the word Columbia as well as the word Pixar.

3. Making things optional

You may have noticed that the word "distributor" in our data is sometimes written in singular and sometimes in plural. With the pipe operator we could of course now say we would like to match both versions of the word: "distributor" or "distributors". But in this case, it's really just the "s" at the end of the word that makes the difference. Regular expressions also allow us to match the two versions of the word in a different manner: by making the "s" at the end optional with a question mark. We can use the function "string view" to get a visual image of our strings. The parts that we matched with our patterns are those with a light gray background. In this case, the result of the two function calls is identical. This works as a regular expression treats all the letters of our pattern individually, so the question mark will only apply to the very last character, the "s". In the next lessons we will see how we can make a larger part of our pattern optional, but first, I want you to introduce to the second meaning of the question mark.

4. Greedy vs. lazy

In order to understand the second functionality of the question mark, we first need to look at another important concept in regular expressions: Greediness. By default, regular expressions are so called "greedy". What does this mean? Let's look at the following pattern that matches everything up to the number 3. We use the period and the star to match an arbitrary amount of characters followed by the number 3. By default, regular expressions will try to match the maximum amount of characters. As the number three occurs twice in our movie title, it will match everything up to the last one it can find. This behavior is called "greedy". But! We can also opt out of this and make our pattern so called "lazy". If we append a question mark to the star, our pattern will not match the maximum, but the minimum amount of characters. It will then only match everything in front of the first occurrence of the number 3.

5. Let's practice!

Alright, in the chapters to come, you will learn about the last important concept of regular expressions: groups. But first, let's practice the last two special characters - the pipe and the question mark.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.