Pairs of restaurants
In the last lesson, you cleaned the restaurants
dataset to make it ready for building a restaurants recommendation engine. You have a new DataFrame named restaurants_new
with new restaurants to train your model on, that's been scraped from a new data source.
You've already cleaned the cuisine_type
and city
columns using the techniques learned throughout the course. However you saw duplicates with typos in restaurants names that require record linkage instead of joins with restaurants
.
In this exercise, you will perform the first step in record linkage and generate possible pairs of rows between restaurants
and restaurants_new
. Both DataFrames, pandas
and recordlinkage
are in your environment.
This exercise is part of the course
Cleaning Data in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create an indexer and object and find possible pairs
indexer = ____
# Block pairing on cuisine_type
indexer.____(____)
# Generate pairs
pairs = indexer.____(____, ____)