Get startedGet started for free

The cutoff point

In this exercise, and throughout this chapter, you'll be working with the restaurants DataFrame which has data on various restaurants. Your ultimate goal is to create a restaurant recommendation engine, but you need to first clean your data.

This version of restaurants has been collected from many sources, where the cuisine_type column is riddled with typos, and should contain only italian, american and asian cuisine types. There are so many unique categories that remapping them manually isn't scalable, and it's best to use string similarity instead.

Before doing so, you want to establish the cutoff point for the similarity score using the thefuzz's process.extract() function by finding the similarity score of the most distant typo of each category.

This exercise is part of the course

Cleaning Data in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import process from thefuzz
____

# Store the unique values of cuisine_type in unique_types
unique_types = ____

# Calculate similarity of 'asian' to all values of unique_types
print(process.____('____', ____, limit = len(____)))

# Calculate similarity of 'american' to all values of unique_types
print(____('____', ____, ____))

# Calculate similarity of 'italian' to all values of unique_types
print(____)
Edit and Run Code