Get startedGet started for free

Planning the query

1. Planning the query

Welcome to chapter 2. In this chapter, we will build out queries for more complex reports.

2. Chapter goal

At the end of this chapter, you will build the second visualization in your dashboard, shown here. The base report for this visualization is a bit more complex than our previous one, and in complex cases, it’s important to take time to plan out the query.

3. Questions to ask

Although there is no one-size-fits-all approach to planning, there are a few key questions worth asking. What tables do we need to pull from? How should we combine those tables? What fields do we need to create? What filters should be included? Do we need to order or limit our query? Answering these questions will guide our query setup, and should make the coding process more efficient.

4. Scenario

To show this process in action, we will work through building this report, which shows gold medals broken up by demographic group and season, specifically for countries in the Western European region. Let's go through each step, one-by-one.

5. 1 - What tables do we need to pull from?

First, what tables do we need to pull from? As before, we can open up our E:R diagram and highlight relevant fields. In this case, we need to reference age and gender to get the demographic group, and we need to pull in the field gold to count total gold medals. There is also a filter related to region. Since region isn't referenced directly in the SELECT statement, let’s highlight it a different color.

6. 2 - How should we combine the tables?

Next, let’s identify how to combine the tables. There are several ways to combine tables, with the most common being JOINs and UNIONs. Often, there is more than 1 way to combine tables. For example, one option is to run two separate JOINs first

7. 2 - How should we combine the tables?

and then UNION them together.

8. 2 - How should we combine the tables?

Alternatively, we can UNION the summer and winter games first

9. 2 - How should we combine the tables?

and then JOIN the dataset to athletes. Both approaches should give us the same result, but our strategy may change depending on which option we use.

10. 3 - What fields do we need to create?

Next question: what fields do we need to create and how will we create them? There are three columns in the report: season, demographic_group, and gold medals. For the field season, we only care to output one static value for each of summer_games and winter_games, so a static string should work. Demographic group buckets together different values based on set conditions, so we need to use some type of conditional. Lastly, we know that gold medals should be an aggregation, and it's clear we should use a SUM. We will go in more detail on how to build these fields later in the chapter.

11. 4 - What filters need to be included?

Next up, let’s talk filters. The report clearly states to filter for countries in the Western European region. Filters come in two forms - WHERE and HAVING. WHERE is used when filtering on a dimension, such as gender, while HAVING is used when filtering on an aggregation, such as SUM of revenue. In our case, we need to filter on country, which is a dimension, so we'll use a WHERE statement.

12. 5 - Any ordering or limiting needed?

Lastly, do we need any ORDER BY or LIMIT statement? In this case, it looks like no limiting is needed, but we need to order by golds in descending order.

13. Let's practice!

Now that we have a general understanding of how to build our query, we can get started building it. Before we do that, it's time for you to practice answering some of these questions.