Session Ready
Exercise

Combining data into one

Before manipulating the Yelp star reviews, you first need to combine the three data sets that were just explored so you can understand and adapt the data more effectively.

The data sets from the previous exercise, reviews, users, and businesses, are data frames, R's way of representing a data set. You can combine a data frame in many ways, but for this exercise you don't want any missing data, let's say a business without a review. So you will use the inner_join() function from the dplyr package to combine the three data sets. The function inner_join() combines two data sets by finding columns with identical labels and then only combining the rows that are found in both independent data sets.

Let's see how it works!

The 3 data frames are already loaded into your R environment. Apply inner_join() to the reviews and the users data sets first. Don't forget to name that newly combined data set. Next, apply inner_join() again but with the newly created data set and the final data set from the previous exercise businesses.

Once the data sets have been combined it can be helpful to explore the data some. Using summary you can get a better feel for the variables that are in out data and the types data you have to use.

Instructions
100 XP
  • The code provided uses library() to load dplyr to the environment
  • Use inner_join() to combine the reviews and users data sets and assign to ru.
  • Use inner_join() to combine the newly created ru and businesses data sets and assign new data frame to rub.
  • Inspect new data frame rub, take note of the variables and types of data.