Finding matches based on two conditions
In this exercise, you'll match 2 datasets with corresponding movie titles, but that also contain typos. In the first table movie_titles
, there are ten movies that you should match with the second table movie_db
. But they are based on scanned documents and they contain errors by the Optical Character Recognition software.
Both tables contain the columns title
and year
. Use these to find matches between them.
Create 2 helper functions that match entries that are similar or equal. One for the movie titles (based on stringdist()
) and one for comparing years, using abs()
(that returns the delta).
Diese Übung ist Teil des Kurses
Intermediate Regular Expressions in R
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Calculate the string distance - it should be smaller than 3
is_string_distance_below_three <- function(left, right) {
___(left, right) < ___
}
is_string_distance_below_three("Hi there", "Hi there")