Get startedGet started for free

Pattern validation

1. Pattern validation

Why do we need pattern validation?

2. Why we need pattern validation

Data needs to follow specific formats to be consistent. We've extracted a list of `dates` from our chocolate sales dataset; the dates should follow a standard format (D-MMM-YY) for accurate processing and reporting. Without pattern validation, inconsistent date formats like "24/02/22" can cause data quality issues.

3. Three types of pattern validation

We'll cover three pattern validation approaches. Full pattern matching looks for exact formats. Pattern finding doesn't require exact positions. Character matching helps with quick data screening. If a validation fails, we can apply string or date normalization discussed earlier. While we focus on dates here, these pattern validation techniques apply to strings in general. Let's explore how to validate these patterns.

4. Defining patterns with regex

We'll use a regular expression to match date patterns in our dataset, like 4-Jan-22: `\\d{1,2}` matches one or two digits for the day, followed by a hyphen. `[A-Za-z]{3}` matches exactly three characters in either case for the month abbreviation, like "Jan". After another hyphen, `\\d{2}` matches exactly two digits for the year (like "22"). This pattern ensures dates follow our d-MMM-yy format, rejecting formats like "24/02/22" or "4-January-22".

5. Full pattern matching

The `Pattern` class compiles a regex pattern `datePattern` that matches our expected date format, like 4-Jan-22. In the loop over `dates`, `datePattern.matcher(date).matches()` checks if the `date` matches our `datePattern`.

6. Full pattern matching: outputs

The output shows that one date 24/02/22 does not match the pattern. Once we correct the date's format, we can apply further processing to the dates like `LocalDate.parse()`.

7. Pattern finding

The `Matcher` class provides more detailed pattern matching capabilities. We define a `monthPattern` to search for a pattern of three letters. In the loop over `dates`, we declare `monthPattern.matcher(date)` to search for the `monthPattern` in `date`. `matcher.find()` returns true if the pattern is found in the text, and `matcher.group()` returns the matched content.

8. Pattern finding: outputs

The outputs show that `matcher` finds the three letter month pattern in all dates except for "24/02/22."

9. Character matching

For simpler character-level validation,`CharMatcher` provides an alternative to regex. In the loop over `dates`, `CharMatcher.inRange('0', '9')` matches digits 0 to 9. The `.matchesAnyOf()` method checks if specific types of characters are present. `CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z'))` matches letters (lowercase or uppercase), and `CharMatcher.is('-')` matches hyphens.

10. Character matching: outputs

The output shows that the only invalid date is "24/02/22" because it is missing letters and hyphens.

11. Putting it all together

Now let's bring together the pattern validation approaches we've learned. `matcher().matches()` checks if the entire string follows our pattern. `matcher().find()` searches for pattern matches within the string. `CharMatcher` provides simple character-level validation. Each approach serves different validation needs, from strict format checking to flexible character searches.

12. Let's practice!

Now it's your turn to use patterns for validation!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.