Add zeros
Many recommendation engines use implicit ratings. In many cases these datasets don't include behavior counts for items that a user has never purchased. In these cases, you'll need to add them and include zeros. The dataframe Z
is provided for you. It contains userId
's, productId
's and num_purchases
which is the number of times a user has purchased a specific product.
Cet exercice fait partie du cours
Building Recommendation Engines with PySpark
Instructions
- Take a look at the dataframe
Z
using the.show()
method. - Extract the distinct
userId
s andproductId
s fromZ
using the.distinct()
method. Call the resultsusers
andproducts
respectively. - Perform a
.crossJoin()
on theusers
andproducts
dataframes. Call the resultcj
. "left"
joincj
to the original ratings dataframeZ
on["userId", "productId"]
. Call the.fillna(0)
method on the result to fill in the blanks with zeros. Call the resultZ_expanded
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# View the data
Z.____()
# Extract distinct userIds and productIds
users = ____.select("____").____()
products = Z.____("productId").____()
# Cross join users and products
____ = users.____(products)
# Join cj and Z
____ = cj.join(Z, ["____", "____"], "left").____(0)
# View Z_expanded
____.show()