Session Ready
Exercise

Pair blocking

Zagat and Fodor's are both companies that gather restaurant reviews. The zagat and fodors datasets both contain information about various restaurants, including addresses, phone numbers, and cuisine types. Some restaurants appear in both datasets, but don't necessarily have the same exact name or phone number written down. In this chapter, you'll work towards figuring out which restaurants appear in both datasets.

The first step towards this goal is to generate pairs of records so that you can compare them. In this exercise, you'll first generate all possible pairs, and then use your newly-cleaned city column as a blocking variable.

zagat and fodors are available.

Instructions 1/2
undefined XP
  • 1
    • Load the reclin package.
    • Generate all possible pairs of records between the zagat and fodors datasets.
    • 2
      • Use pair blocking to generate only pairs that have matching values in the city column.