Pair blocking
Zagat and Fodor's are both companies that gather restaurant reviews. The zagat
and fodors
datasets both contain information about various restaurants, including addresses, phone numbers, and cuisine types. Some restaurants appear in both datasets, but don't necessarily have the same exact name or phone number written down. In this chapter, you'll work towards figuring out which restaurants appear in both datasets.
The first step towards this goal is to generate pairs of records so that you can compare them. In this exercise, you'll first generate all possible pairs, and then use your newly-cleaned city
column as a blocking variable.
zagat
and fodors
are available.
This exercise is part of the course
Cleaning Data in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load reclin
___
# Generate all possible pairs
___