Pattern validation
1. Pattern validation
Why do we need pattern validation?2. Why we need pattern validation
Data needs to follow specific formats to be consistent. We've extracted a list of `dates` from our chocolate sales dataset; the dates should follow a standard format (D-MMM-YY) for accurate processing and reporting. Without pattern validation, inconsistent date formats like "24/02/22" can cause data quality issues.3. Three types of pattern validation
We'll cover three pattern validation approaches. Full pattern matching looks for exact formats. Pattern finding doesn't require exact positions. Character matching helps with quick data screening. If a validation fails, we can apply string or date normalization discussed earlier. While we focus on dates here, these pattern validation techniques apply to strings in general. Let's explore how to validate these patterns.4. Defining patterns with regex
We'll use a regular expression to match date patterns in our dataset, like 4-Jan-22: `\\d{1,2}` matches one or two digits for the day, followed by a hyphen. `[A-Za-z]{3}` matches exactly three characters in either case for the month abbreviation, like "Jan". After another hyphen, `\\d{2}` matches exactly two digits for the year (like "22"). This pattern ensures dates follow our d-MMM-yy format, rejecting formats like "24/02/22" or "4-January-22".5. Full pattern matching
The `Pattern` class compiles a regex pattern `datePattern` that matches our expected date format, like 4-Jan-22. In the loop over `dates`, `datePattern.matcher(date).matches()` checks if the `date` matches our `datePattern`.6. Full pattern matching: outputs
The output shows that one date 24/02/22 does not match the pattern. Once we correct the date's format, we can apply further processing to the dates like `LocalDate.parse()`.7. Pattern finding
The `Matcher` class provides more detailed pattern matching capabilities. We define a `monthPattern` to search for a pattern of three letters. In the loop over `dates`, we declare `monthPattern.matcher(date)` to search for the `monthPattern` in `date`. `matcher.find()` returns true if the pattern is found in the text, and `matcher.group()` returns the matched content.8. Pattern finding: outputs
The outputs show that `matcher` finds the three letter month pattern in all dates except for "24/02/22."9. Character matching
For simpler character-level validation,`CharMatcher` provides an alternative to regex. In the loop over `dates`, `CharMatcher.inRange('0', '9')` matches digits 0 to 9. The `.matchesAnyOf()` method checks if specific types of characters are present. `CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z'))` matches letters (lowercase or uppercase), and `CharMatcher.is('-')` matches hyphens.10. Character matching: outputs
The output shows that the only invalid date is "24/02/22" because it is missing letters and hyphens.11. Putting it all together
Now let's bring together the pattern validation approaches we've learned. `matcher().matches()` checks if the entire string follows our pattern. `matcher().find()` searches for pattern matches within the string. `CharMatcher` provides simple character-level validation. Each approach serves different validation needs, from strict format checking to flexible character searches.12. Let's practice!
Now it's your turn to use patterns for validation!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.