CommencerCommencer gratuitement

Add zeros

Many recommendation engines use implicit ratings. In many cases these datasets don't include behavior counts for items that a user has never purchased. In these cases, you'll need to add them and include zeros. The dataframe Z is provided for you. It contains userId's, productId's and num_purchases which is the number of times a user has purchased a specific product.

Cet exercice fait partie du cours

Building Recommendation Engines with PySpark

Afficher le cours

Instructions

  • Take a look at the dataframe Z using the .show() method.
  • Extract the distinct userIds and productIds from Z using the .distinct() method. Call the results users and products respectively.
  • Perform a .crossJoin() on the users and products dataframes. Call the result cj.
  • "left" join cj to the original ratings dataframe Z on ["userId", "productId"]. Call the .fillna(0) method on the result to fill in the blanks with zeros. Call the result Z_expanded.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# View the data
Z.____()

# Extract distinct userIds and productIds
users = ____.select("____").____()
products = Z.____("productId").____()

# Cross join users and products
____ = users.____(products)

# Join cj and Z
____ = cj.join(Z, ["____", "____"], "left").____(0)

# View Z_expanded
____.show()
Modifier et exécuter le code