1. Learn
  2. /
  3. Courses
  4. /
  5. Intermediate Regular Expressions in R

Connected

Exercise

Finding matches based on two conditions

In this exercise, you'll match 2 datasets with corresponding movie titles, but that also contain typos. In the first table movie_titles, there are ten movies that you should match with the second table movie_db. But they are based on scanned documents and they contain errors by the Optical Character Recognition software.

Both tables contain the columns title and year. Use these to find matches between them.

Create 2 helper functions that match entries that are similar or equal. One for the movie titles (based on stringdist()) and one for comparing years, using abs() (that returns the delta).

Instructions 1/3

undefined XP
  • 1
    • Make the function is_string_distance_below_three() return TRUE if the stringdistance between left and right is below 3.
  • 2
    • Make is_closer_than_three_years() return TRUE if the absolute difference between left and right is smaller than three.
  • 3
    • Use the helper functions to join the two data frames on the two columns "title" and "year".