Get startedGet started for free

The Olympics dataset

1. The Olympics dataset

Now that you understand the case study, it’s time to learn about our dataset. Before we dig into the data, I’d like to introduce the concept of an E:R Diagram.

2. Entity-Relationship (E:R) diagram

An E:R or entity-relationship diagram is a visual representation of a database's structure. E:R diagrams show all tables and fields, and connects objects visually to show relationships. An E:R diagram is a great resource to quickly understand the structure of a database.

3. Olympics dataset E:R diagram

Here’s the E:R diagram of our Olympics dataset. It consists of 5 tables: summer_games, winter_games, athletes, countries, and country_stats. Pay attention to the fields found in each table. You’ll notice several “id” fields. An id field represents a unique object and allows for joins between two tables.

4. Olympics dataset E:R diagram

For example, each “id” field in the countries table represents a unique country.

5. Olympics dataset E:R diagram

This id relates to the “country_id” field in the country_stats table. It should be no surprise, then, that when we join the two tables, we join on these two highlighted fields.

6. Tables and fields

Let’s dig a bit deeper into each of the tables. The athletes table appears straightforward; each row represents an athlete, including their name, gender, age, height, and weight.

7. Tables and fields

The summer_games and winter_games tables have the exact same structure. Each row represents the results of a given athlete within an Olympic event. There are three fields that appear to represent the medals in the event: bronze, silver, and gold. We will need to dig into these three fields to understand the format.

8. Tables and fields

Countries is a straightforward table as well, as it simply includes the country and its corresponding region.

9. Tables and fields

Lastly, the country_stats table tracks several metrics related to countries, including gdp, population, and nobel_prize_winners. When dealing with a novel dataset, I recommend taking the time to get a high-level understanding of the tables and relationships. There may be crucial fields that you are not aware of, but taking time to understand the tables and fields can help prevent that situation.

10. Reference for queries

Outside of giving a high-level overview of the tables and fields, E:R diagrams can also be a good reference document when planning out large queries. If you know the required fields for your report, you can identify what tables need to be pulled

11. Reference for queries

by highlighting the relevant fields. Then, you can visually understand what types of JOINs, UNIONs, and other logic may be needed when building your query.

12. Reference for queries

Here’s an example. Let’s say you are trying to build a report that shows total population by region. By looking at the E:R diagram, we can quickly identify which fields are required, as highlighted. We now have an idea what tables need to be pulled and should have an easier time setting up the query. According to the E:R diagram, we can pull from the countries table and then join to the country_stats table ON the id and country_id fields, as indicated by the connector. The full query is shown here, which joins countries and country_stats to pull in region and pop_in_millions. Notice how the ON statement joins the two tables on the id and country_id fields, respectively.

13. Practice time!

Next up, let's practice using E:R diagrams when setting up queries.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.