Filtering text
1. Filtering text
We've filtered exact text matches. However, real text data is messy and often requires flexibility.2. Pattern matching
If we knew the exact book we wanted, like Matilda, we could extract the exact match.3. Pattern matching
However, if we were interested in books about investments, we could search for any book with "invest" in the title. Pattern matching with wildcards makes this possible.4. Wildcards
We should already be familiar with one wildcard, the asterisk, which selects all fields from a table. Now, we'll add two new pattern-matching wildcards, the percent and the underscore. There's a lot to cover as there are a few variations we might encounter when prompting for text filters. Let's begin.5. Matching text with %
When we prompt, "Show film titles containing 'love'", the AI generates a filter with two percent signs. This wildcard matches any character. Its position before and after the search term means that anything can come before or after the word "love." It is a case-sensitive search that would only find the exact lowercase spelling.6. Case sensitivity
To match any case, we'd need to specify that. A prompt like, "Show film titles containing 'love' in any case", may result in an ILIKE, with the I indicating case-insensitivity.7. Case sensitivity
Or we could see UPPER() or LOWER() functions that turn all field values into a uniform format that we then match.8. Matching text with %
Wildcard position changes everything. The prompt "Show film titles ending with 'and', any case" will only show the wildcard before the search term. Similarly, having it at the end means we want something that starts with that word.9. Matching text with _
Let's try our second new wildcard. The underscore matches exactly one character. Prompting, "Show five-letter film titles starting with E", our filter would use the capital letter E and four underscores. Our search finds two films, Evita and Earth. Films like Elf or Enchanted would not appear in our results as they are not five letters long.10. Excluding text values
Finally, there is NOT LIKE, which is similar to the not equal to operator. "Show film titles that don't start with 'The'" will filter out those films.11. Pattern matching
The pattern matching we've explored today is essential for finding similarities in text fields, not just with films or books, but in many business cases. This technique can help marketing analysts identify email addresses containing Gmail or Outlook, for example.12. Pattern matching
It can also help product teams narrow down results if they know all product IDs ending with X are discontinued items.13. Pattern matching
Remember these text search essentials for flexible prompting and querying. Text search is case sensitive, so precise prompts are essential. The * selects all fields, while _ and % match characters or positioning. LIKE matches patterns, while NOT LIKE excludes patterns. ILIKE ignores case differences, and the UPPER() and LOWER() functions standardize capitalization for reliable comparisons.14. Let's practice!
In the next video, we'll filter missing data, a critical real-world data challenge. First, let's practice some pattern matching!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.