Average rating per course
A great way to recommend courses is to recommend top-rated courses, as DataCamp students often like courses that are highly rated by their peers.
In this exercise, you'll complete a transformation function transform_avg_rating() that aggregates the rating data using the pandas DataFrame's .groupby() method. The goal is to get a DataFrame with two columns, a course id and its average rating:
| course_id | avg_rating |
|---|---|
| 123 | 4.72 |
| 111 | 4.62 |
| … | … |
In this exercise, you'll complete this transformation function, and apply it on raw rating data extracted via the helper function extract_rating_data() which extracts course ratings from the rating table.
This exercise is part of the course
Introduction to Data Engineering
Exercise instructions
- Complete the
transform_avg_rating()function by grouping by thecourse_idcolumn, and taking the mean of theratingcolumn. - Use
extract_rating_data()to extract raw ratings data. It takes in as argument the database enginedb_engines. - Use
transform_avg_rating()on the raw rating data you've extracted.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Complete the transformation function
def transform_avg_rating(rating_data):
# Group by course_id and extract average rating per course
avg_rating = rating_data.____('____').rating.____()
# Return sorted average ratings per course
sort_rating = avg_rating.sort_values(ascending=False).reset_index()
return sort_rating
# Extract the rating data into a DataFrame
rating_data = extract_rating_data(____)
# Use transform_avg_rating on the extracted data and print results
avg_rating_data = transform_avg_rating(____)
print(avg_rating_data)