Add zeros
Many recommendation engines use implicit ratings. In many cases these datasets don't include behavior counts for items that a user has never purchased. In these cases, you'll need to add them and include zeros. The dataframe Z is provided for you. It contains userId's, productId's and num_purchases which is the number of times a user has purchased a specific product.
Cet exercice fait partie du cours
Building Recommendation Engines with PySpark
Instructions
- Take a look at the dataframe
Zusing the.show()method. - Extract the distinct
userIds andproductIds fromZusing the.distinct()method. Call the resultsusersandproductsrespectively. - Perform a
.crossJoin()on theusersandproductsdataframes. Call the resultcj. "left"joincjto the original ratings dataframeZon["userId", "productId"]. Call the.fillna(0)method on the result to fill in the blanks with zeros. Call the resultZ_expanded.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# View the data
Z.____()
# Extract distinct userIds and productIds
users = ____.select("____").____()
products = Z.____("productId").____()
# Cross join users and products
____ = users.____(products)
# Join cj and Z
____ = cj.join(Z, ["____", "____"], "left").____(0)
# View Z_expanded
____.show()