Get startedGet started for free

Bringing it all together

1. Final example

Let us now bring everything that you've learned in this course together. The goal is always to help management make informed and data driven decisions.

2. Business Case

MovieNow considers investing money in new movies. It is more expensive to add recent movies than it is to add older ones. So the first step in our data analysis is to investigate the preference of customers for movies depending on the year the movie was released. We will produce a table to see if customers give better ratings to recent movies, and if there is a difference across countries.

3. 1. Join data

The first step is to join the data we need for the analysis. We get information about movie ratings from the table renting, the country of the customer from the table customers and the year of release of the movies from the table movies.

4. 2. Select relevant records

Next we select only the rows required for the analysis. We restrict the table to the movies that have at least 4 ratings. Therefore, we use a nested query in the WHERE clause for movie_id. The nested query selects the movie IDs from movies with a rating at least equal to four from the table renting. We have another restriction: the date of the movie rental has to be April 1st 2018 or later.

5. 3. Aggregation

The final step is the aggregation of the data. To get a good overview we choose three different types of aggregation. We count the number of movie rentals, the number of different movies and the average rating. We're using hierarchical level of aggregation: we go from a big overview to more detailed observations. We will calculate the total aggregation, the aggregation for movies by year of release, and finally the aggregation for movies from each year of release depending on the country of the customers.

6. 3. Aggregation

In the SELECT clause we have the three different levels of aggregation: the number of movie rentals, the number of distinct movies and the average rating. In the last but one row we GROUP BY ROLLUP first on the year of release, and then on the country to obtain the required levels of aggregation. In the final row we order the results by country and year of release.

7. Resulting table

Here are the last few lines of the resulting table. In the last line we can see the total aggregation with 333 movie rentals, 50 different movies and an average rating of 7.9. The remaining rows are aggregations for the year of release ordered by increasing year. A statistical analysis could be performed to investigate possible trends, but intuitively we can already see that there is no clear pattern of better ratings for movies with later year of release.

8. Let's practice!

It is necessary to further investigate in which kind of movies the new budget should be invested. Let's find out the answer!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.