Get startedGet started for free

Combining tables

1. Combining tables

So far, we've planned out our query. Now it's time to build it. We will start off by combining relevant tables.

2. Goal report

As a refresher, here's the report we are looking to create, which shows gold medals by demographic group and season for Western European countries.

3. Relevant tables

Now we've already identified relevant fields. Blue represents a field that will be referenced in our SELECT statement, while orange represents a field related to filtering. According to this, we need to pull in three tables:

4. Relevant tables

winter_games, summer_games, and athletes. We'll hold off on pulling in countries since we can answer that when we address filters. But how exactly are we going to combine these tables? As mentioned previously, there are multiple approaches you can take here.

5. Option A: JOIN first, UNION second

First, we can JOIN each of summer and winter games to athletes, and then UNION the resulting datasets.

6. Option A: JOIN first, UNION second

In this approach, step one is to set up the top half of the query. Since we need a SELECT statement to test our code, let's pull in a few relevant fields: athlete_id, gender, age, and gold. Don't worry about doing any aggregations yet; we just want to ensure the code works. Run the code once this step is complete to ensure there are no errors.

7. Option A: JOIN first, UNION second

Once the join is validated, step two to is to replicate the code and update it to reference winter games. Lastly, we combine the two with a UNION. I tend to use UNION ALL, as UNION may cause a loss in data as it removes any common rows found in both tables, but both UNION and UNION ALL have their place. So that's option A. If there is any confusion, I suggest pausing the video and making sure you fully understand the query.

8. Option B: UNION first, JOIN second

The other option is to start with a UNION and then run a JOIN on the unioned dataset.

9. Option B: UNION first, JOIN second

First, we must create the initial UNION. As before, we are only going to pull in relevant fields.

10. Option B: UNION first, JOIN second

Step two is to add the join. To do this, you will need to turn the UNIONed query into a subquery and alias it. Here, we have our query from step 1 aliased as "g", which represents games. From here, we can add the JOIN, but notice the additional SELECT statement at the top. This is because we are now pulling from the subquery g, which acts as a separate table. Again, make sure you take the time to understand this query, so feel free to pause the video here.

11. Comparison

Both approaches give us the same output, but there are clear differences between the two. You can see a comparison here. When deciding which to use, there are two factors to take into account: readability and performance. In our example, both queries have decent readability. As for performance, both queries should run relatively quickly since our tables are small. However, if you ever notice a query taking a while to run, it's worth trying out another approach. In the end, it's your preference which option to use. We'll use option A for the remainder of this chapter.

12. Key takeaways

The key takeaway from this lesson is that are often several ways to set up the same report. This is especially true as you create even more complex reports. As complexity increases, taking a step-by-step approach becomes more valuable, as it allows you to tackle challenges on a smaller scale and makes it easier to troubleshoot errors.

13. Query time!

So that's it! Let's see this in action and practice these concepts.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.