Get startedGet started for free

More regular expressions

1. More regular expressions

To review what you've learned so far, in regular expressions

2. Regular expression review

a carat is used to match the start of a string, a dollar sign matches the end of a string and a dot matches any single character. In rebus you use the constants,

3. Regular expression review

START,

4. Regular expression review

END and

5. Regular expression review

ANY_CHAR.

6. Regular expression review

If a dot is used in regular expressions to mean any character, you might be wondering how you match a dot exactly. Just like quotes inside a string,

7. Regular expression review

you escape it. That is, you type a backslash then a dot. The tricky part is that you have to escape the backslash as well, so the sequence is actually backslash, backslash, dot. rebus makes this easier: you simply use the DOT constant. That also applies to the dollar sign and carat, because they have special meaning in regular expressions, if you want to match the actual character dollar or carat they need to be escaped in the regular expression, or in rebus you use the appropriately named constant. Let's add a few more words to your regular expressions vocabulary: alternation, character classes and repetition. In a regular expression, if you want to specify: "match the pattern a or b",

8. Alternation

you use what is called alternation. Alternation in the regular expressions language happens to look a lot like a logical OR in R. For example this regular expression, with dog and cat separated by a pipe and surrounded by parentheses, says: match the pattern d-o-g, or match the pattern c-a-t. In rebus, you use the function or to construct a regular expression with a set of alternative matches. You might notice regular expression returned from rebus was a little different to the regular expression I showed you. The question mark colon, is added by rebus to signify this is a "non-capturing group". I'll talk about capturing in the next chapter; for now when you see question mark colon inside parentheses you can ignore it, it doesn't change what will match the pattern.

9. Character classes

Character classes are a bit like ANY_CHAR but with a restriction on what characters are allowed. Instead of matching _any_ single character, they match any single character that is in this set. In rebus you use the function char_class and pass a string that contains all the characters you want to match. This character class will match a lower or upper case A. You can see in the regular expression language a character class is specified with square brackets. A negated character class, specifies that you should match any single character that is not in this set. In this example, any single character that is a not a lower case or upper case "a". In regular expressions a negated character class is just like a character class with a carat as the first character inside the square brackets. Unlike in other places in a regular expression, inside character classes you don't need to escape characters that might otherwise have a special meaning. If you want to match a dot you can include a dot directly. Matching a minus is a bit trickier, if you need to do it, just make sure it comes first in the character class. There are a few ways to specify a pattern that

10. Repetition

repeats. A pattern can be optional, occur zero or more times, one or more times, or be repeated within a specified range of times. In the regular expressions language these are specified with the question mark, star, plus and using curly braces, respectively.

11. Repetition

This allows pattern like: match an "a" one or more times Don't worry if it seemed like you just covered a lot of new patterns.

12. Let's practice!

You'll get to practice each one in the following exercises.