1. Grouping movies
In this lesson, we will use queries with the GROUP BY clause.
Using the GROUP BY clause allows us to apply aggregations to groups in a table.
2. GROUP BY Applications
To evaluate the success and the potential of a company, it is often desirable to look at groups of customers or groups of products jointly.
We could be interested in grouping customers – for example by country or gender -
or in grouping movies by genre or year of release.
We will now look at an example where we are interested in the average renting price by genre.
3. Table: movies_selected
Here we extracted a subset of the 'movies' table for illustration. We call this table 'movies_selected'.
In this table, we have the movie's name, its genre and rental price. Now, we are interested in the average price of renting a movie for each genre.
We will do this step using the GROUP BY clause.
4. GROUP BY
This is the basic SQL query with GROUP BY. We select the genre from the table movies_selected and group it by genre.
By itself, the result is not different than what you get with the SELECT DISTINCT statement, but with the GROUP BY clause, we can add aggregation functions to the SELECT component, such as average, sum, minimum or maximum.
5. Average renting price
We are interested in the average price per genre, so we add the average of the prices in the SELECT component and give it the alias avg_price.
We obtain a table with the genres and the average price per genre.
To better understand what this query does, let’s go back to the table movies_selected.
6. movies_selected table
We can think of a GROUP BY statement as a split into subgroups. The colors of the rows in the table illustrate the different genres. The GROUP BY genre clause creates groups based on the genre.
7. movies_selected table split
as you can see here - every group is in a separate table. We have groups for the genres Fantasy, Drama, and so on. Then we can apply any aggregation function to each of the sub-tables. Besides the average, which we used before, we can additionally use COUNT to get the number of movies in each sub-table, for example 3 for Fantasy and 4 for Drama.
8. Average rental price and number of movies
This is the query extended by the number of movies in each category.
We have used COUNT with an asterisk before to count all rows in a query. Now we use it to count the number of movies for each genre.
9. HAVING
The HAVING clause can be added to a GROUP BY statement to retrieve the values of only those groups which satisfy specified conditions. It is only used in combination with a GROUP BY clause and can include aggregate functions such as sum, average or count.
In this example, only those rows of the table are selected where the number of columns is larger than 2 or with other words where the number of movies is greater than 2.
10. Let's practice!
And now it's your turn to group and aggregate data from the customer and renting tables.