Fixing typos with string distance
In this chapter, one of the datasets you'll be working with, zagat
, is a set of restaurants in New York, Los Angeles, Atlanta, San Francisco, and Las Vegas. The data is from Zagat, a company that collects restaurant reviews, and includes the restaurant names, addresses, phone numbers, as well as other restaurant information.
The city
column contains the name of the city that the restaurant is located in. However, there are a number of typos throughout the column. Your task is to map each city
to one of the five correctly-spelled cities contained in the cities
data frame.
dplyr
and fuzzyjoin
are loaded, and zagat
and cities
are available.
This exercise is part of the course
Cleaning Data in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Count the number of each city variation
zagat %>%
count(___)