Session Ready
Exercise

Generating average authenic reviews

You've now selected your authentic Indian users and can use their reviews to generate average authentic star reviews!

There are many ways to do this but the dplyr package, as you have seen, has many great tools to quickly group data, add new columns and calculate new values. You will use the select, group_by, summarise and mutate functions to add new variables to the larger data set.

The select function allows you to isolate the variables you wish to use to create the new values. The group_by(), %>% and summarized() functions allow for separate calculations to be performed within the unique values of the variable or variables being grouped.

You should create a new star review column called new_star and a column of the difference between the original average star reviews and the new star reviews. Assign the column of differences to diff.

Instructions
100 XP
  • Generate a data frame avg_review_indian using tools from dplyr

    - group_by the variables city, business_name, and avg_stars

    - Use the n() function to tally the number of reviews for that restaurant

    - Create a new_stars column using a sum of the stars column

    - Using the mutate function, add a diff variable by subtracting the new_stars column by the avg_stars column