Finding matches based on two conditions
In this exercise, you'll match 2 datasets with corresponding movie titles, but that also contain typos. In the first table movie_titles
, there are ten movies that you should match with the second table movie_db
. But they are based on scanned documents and they contain errors by the Optical Character Recognition software.
Both tables contain the columns title
and year
. Use these to find matches between them.
Create 2 helper functions that match entries that are similar or equal. One for the movie titles (based on stringdist()
) and one for comparing years, using abs()
(that returns the delta).
This exercise is part of the course
Intermediate Regular Expressions in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Calculate the string distance - it should be smaller than 3
is_string_distance_below_three <- function(left, right) {
___(left, right) < ___
}
is_string_distance_below_three("Hi there", "Hi there")