Get startedGet started for free

Finding number of reviews per user

You now have a manageable data set with just one type of cuisine. It's time to begin adapting the Yelp star reviews to see if you can make them more meaningful. In this course, you will look into just two of the almost infinite ways one can scale and manipulate reviews. The first method is to create a new review that gives more weight to those who have reviewed more restaurants of the same cuisine.

To do this start by creating a new data frame with the number of reviews each reviewer has made for the collection of Indian restaurants in the original data set.

The new data frame will be created using the select(), group_by(), %>% and summarize() functions of the dplyr package. The select() function determines the columns that will be included in the new data frame. The group_by(), %>% and summarise() functions allow for separate summaries to be performed within the unique values of the variable being grouped.

After making the data frame, explore it! Check out the range in numbers of reviews and also the average number of reviews per user.

This exercise is part of the course

R, Yelp and the Search for Good Indian Food

View Course

Exercise instructions

  • Create a new data frame number_reviews_indian by selecting columns: user_id, user_name, using group_by variable user_id and summarise() with n() to create total_reviews column
  • Print the table of total_reviews
  • Show the average number of reviews per users by averaging the total_reviews

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# The package dplyr is available to use
# Generate a new data frame with the number of reviews by each reviewer
number_reviews_indian <- indian %>% 
  select(___, ___) %>%
  group_by(___) %>% 
  summarise(total_reviews = n())

# Print the table of total_reviews
table(number_reviews_indian$___)

# Pring the average number of reviews per users
mean(number_reviews_indian$___)
Edit and Run Code