Get startedGet started for free

From ratings to recommendations

1. From ratings to recommendations

Welcome to the second part of this chapter. In the previous exercises, you've seen how to extract and transform data from the rating table. It's time to make our recommendations.

2. The recommendations table

The goal is to end up with triplets of data, where the first column is a user id, the second column is a course id, and the final column is a rating prediction. By rating prediction we mean we'll estimate the rating the user would give the course, before they even took it. The triplets form the top three recommended courses for each unique user id in the rating table. Note that this format is useful, as applications that can access the recommendations table can query it for a specific user, and will instantly get three courses to recommend.

3. Recommendation techniques

We can use several techniques to transform the rating table into recommendations. Lots of established methods to do this are related to matrix factorization. Going into detail on these is not within the scope of this course, but there's an excellent course on 'Building Recommendation Engines with PySpark' in our course catalog at DataCamp.

4. Common sense transformation

For this course, let's try to come up with a transformation derived from common sense. The ultimate goal is to take the rating table and extrapolate three courses that would be nice to recommend.

5. Average course ratings

In the previous exercises, you've managed to derive average course ratings for each course id. That aggregate will be useful, as we'll want to recommend highly rated courses.

6. Use the right programming language

Second, we'll want to recommend courses in the programming language that interests the user. We have data on programming languages of courses, and we have data on which users rate which courses. It makes sense if a user rates 4 courses, 2 of them being SQL, to recommend them a SQL course next. We have all the data to do this in the two tables you already saw.

7. Recommend new courses

Finally, we want only to recommend courses that haven't been rated yet by the user. That means that the `user_id` and `course_id` combination in the recommendation should not be in the rating table.

8. Our recommendation transformation

Using these three techniques, we can come up with a common-sense recommendation strategy. The first rule is we'll recommend courses in the technologies for which the user has rated most courses. The second rule is we'll not recommend courses that have already been rated for that user. The final rule is we'll recommend the three courses that remain with the highest rating.

9. An example

Let's look at an example. A user has rated three courses: two SQL courses and an R course. We'll recommend only SQL courses. We won't recommend courses with id 12 or 52 since the user already rated them. Finally, we'll recommend the three top-rated SQL courses from the remaining SQL courses.

10. Let's practice!

In the exercises, we'll write this strategy into a transformation.