Get startedGet started for free

Distinct As You Like It

1. Distinct As You Like It: Filtering with Regular Expressions

We've seen how to construct filters comparing a field's value exactly. For string-valued fields, we may want instead to match a field's value to a pattern. We may want to match a substring. We may want to constrain that substring to appear at the start or end of a field's value. Or, we may want something more complex. Regular expressions are a powerful way to express such filters. Let's see how MongoDB supports them.

2. Finding a substring with $regex

Let's look at the laureate document for Marie Curie. Recall that she discovered a new element and named it polonium. She did this to publicize her native land's lack of independence. We see here that Poland is a substring of her document's "bornCountry". How can we filter for values of "bornCountry" that contain Poland as a substring? We can use MongoDB's regular expression operator, regex. Here I use the regex operator on the string "Poland" in a filter document. This expression gets distinct values of "bornCountry" that contain "Poland" as a substring. The results show that some laureates were born in places that at the time were not part of Poland but today are. Others were born in places that at the time were part of Poland but today are not. And finally, some were born in places that both at the time were and today are part of Poland.

3. Flag options for regular expressions

We can use the regex operator together with the options operator. This will change the conditions for matching. For example, the "i" option ensures case-insensitive matching. The string passed to regex in the second statement is "poland", all lower case. The assertion here is true - Poland is always capitalized for this field. MongoDB also supports compiled regular expression objects. The pymongo driver includes a bson package with a Regex class, which you can import and use as shown. Finally, using native Python regular expression objects is possible. I do not recommend this, though. Use of the bson Regex class is more robust for MongoDB.

4. Beginning and ending (and escaping)

The syntax of regular expressions is rich. For the exercises, though, you only need to know a few tricks. First, you need to know how to match the beginning or end of a field's value. Second, you need to know how to escape a special character so that you match the character itself. To match the beginning of a field's value, use the caret character. Anchor it to the beginning of the string you pass to regex. This expression returns distinct values of the "bornCountry" field that start with Poland. To escape a character, use a backslash. A paren functions to capture groups in regular expressions. Because we want to match a literal open paren and not use this function, we escape it with a backslash. This expression returns "bornCountry" values for countries that used to be Poland. Finally, to match the end of a field's value, use the dollar sign. Anchor it to the end of what you pass to regex. This expression returns all countries that became Poland after a laureate's birth. What you see here is all you need for the exercises. Use a caret to match the beginning of a field, a dollar sign to match the end, and a backslash to escape parentheses.

5. Let's practice!

We have new tools to answer questions about string-valued fields in MongoDB collections. Let's practice!