Exercise

Finding authentic users

You have successfully cleaned the list of native Indian names and you are ready to select just the reviews from the users that have a name that is part of this list.

The subset function will make this task simple. Split the indian data set by defining the subset argument within the subset function. You can define the column to in which to divide the data by with the subset argument. Using the %in% operator, you can define criteria in which to select from the column defined by the subset function. In this case, it would be looking for authentic Indian names within the user_name column.

Example Subset Code:

alpha = c("A","B","C","D","E","F")

subset(alpha, alpha %in% "A")

[1] "A"

After successfully subsetting the data, generate a table of the authentic Indian users to get a sense of the size of the data.

Take a look at the number of users in each city. The select, group_by, summarise and n() functions of the dplyr package are great tools for quickly calculating the users in each city.

Instructions

100 XP
  • Subset the indian data set to just the users with native Indian names. Use the operator %in% with the indian_names_clean data set as the subset terms.
  • Use the code provided to generate the number_authentic_city data frame using select, group_by, summarise and n().
  • Print the resulting total number of authentic users in each city.