BaşlayınÜcretsiz başlayın

Assigning integer id's to movies

Let's do the same thing to the movies. Then let's join the new user IDs and movie IDs into one dataframe.

Bu egzersiz, kursun bir parçasıdır

Building Recommendation Engines with PySpark

Kursa Göz Atın

Egzersiz talimatları

  • Use the .select() and the .distinct() methods to extract all unique Movies from the ratings dataframe.
  • Repartition the movies dataframe to one partition using coalesce().
  • Complete the partial code provided to assign unique integer IDs to each movie. Name the new column movieId and call the .persist() method on the resulting dataframe.
  • Join the ratings dataframe to the users dataframe and subsequently to the movies dataframe. Call the result movie_ratings.

Uygulamalı etkileşimli egzersiz

Bu egzersizi bu örnek kodu tamamlayarak deneyin.

# Extract the distinct movie id's
movies = ratings.select("____").distinct() 

# Repartition the data to have only one partition.
movies = movies.coalesce(____) 

# Create a new column of movieId integers. 
movies = movies.withColumn("____", monotonically_increasing_id()).____() 

# Join the ratings, users and movies dataframes
movie_ratings = ratings.join(____, "User", "left").join(____, "Movie", "left")
movie_ratings.show()
Kodu Düzenle ve Çalıştır