1. Learn
  2. /
  3. Courses
  4. /
  5. Building Recommendation Engines with PySpark

Exercise

Add zeros

Many recommendation engines use implicit ratings. In many cases these datasets don't include behavior counts for items that a user has never purchased. In these cases, you'll need to add them and include zeros. The dataframe Z is provided for you. It contains userId's, productId's and num_purchases which is the number of times a user has purchased a specific product.

Instructions

100 XP
  • Take a look at the dataframe Z using the .show() method.
  • Extract the distinct userIds and productIds from Z using the .distinct() method. Call the results users and products respectively.
  • Perform a .crossJoin() on the users and products dataframes. Call the result cj.
  • "left" join cj to the original ratings dataframe Z on ["userId", "productId"]. Call the .fillna(0) method on the result to fill in the blanks with zeros. Call the result Z_expanded.