Name: Similarities Between Strings - Fuzzy joins
Uploaded: 2024-11-18T00:00:00Z
Description: In the last chapter, we will shift gears away from regular expressions to understanding string distances. By calculating the differences of multiple strings, we can match those that are similar. This will help us to find duplicates even when they contain small errors like typos. This is an important part to record linkage where we combine datasets from multiple sources.

1
Regular Expressions: Writing Custom Patterns
Free
Regular expressions can be pretty intimidating at first as they contain vast amounts of special characters. In this chapter, you'll learn to decipher these and write your own patterns to find exactly what you're looking for.
2
Creating Strings with Data
In this chapter, we will slightly move away from regular expressions and focus on string manipulation by creating strings from other data structures like vectors or lists.
3
Extracting Structured Data From Text
One task where regular expressions really shine is making sense from a blob of text. In this chapter, you'll learn to extract the information from messy data that doesn't come in neatly arranged tables but in plain text.
4
Similarities Between Strings
In the last chapter, we will shift gears away from regular expressions to understanding string distances. By calculating the differences of multiple strings, we can match those that are similar. This will help us to find duplicates even when they contain small errors like typos. This is an important part to record linkage where we combine datasets from multiple sources.

Initializing