Get startedGet started for free

SQL relationships

1. SQL relationships

Tables can be related to one another via columns that act as a bridge between the tables.

2. Relationships

We use relationships to avoid duplicating data. For example, an employee table might be related to a location table so that we can know which location an employee works at without the need to copy the same location data in every employee's record. Relationships allow us to change the data in one place. Back to our employee to location example, if that location moves to a new building, we'd be able to update the address once in the location table, and every employee related to that location would show the new address. Another way that we might use relationships is to store additional details that we don't need to use as often. These relationships might be predefined in the table.

3. Relationships

In our census data, we have a census table and a state_fact table that are related by the state name which is found in the state column of the census table and name column of the state_fact table. Let's use this predefined relationship to get the state abbreviation from the state_fact table instead of the name and the population in 2008 for that same record from the census table. This is called a join!

4. Automatic joins

We build our statement with the column from each table that we desire. Then we can execute the query and print the results. Those results now show each record with the state abbreviation instead of the state name. SQLAlchemy automatically adds the right join clause because it is predefined in the database.

5. Join

We can use a join clause to add a relationship that isn't necessarily predefined in a query. The join clause takes a related table and an expression that details the relationship. If the relation is predefined in the table, we don't need that expression. The join clause should be placed right after the select statement and prior to any where, order_by or group_by clauses. When we want to build queries that do not select a column from each table but use both tables in other clauses, we have to tell SQLAlchemy what to tables to use in the query.

6. select_from()

The select_from method of the select statement allows us to do that, and a join clause is passed as the argument to select_from.

7. select_from() example

In this example, we want to determine the total population in 2000 that was within the 10th Circuit Court jurisdiction. We use our select statement to sum the pop2000 column from the census table, then we append the select_from method to include the census table joined with the state_fact table. Next, we use a where clause to find only the records where the circuit_court column from the state_fact table is 10. After executing the statement, we can print our result.

8. Joining tables without predefined relationship

So far, we have been using the join statement with a relationship already existing in the database. However, often as a data scientist, we will get tables that have related data, but are not setup with a relationship. To join tables we can give the join clause a Boolean expression that explains how the tables are related. This is the same type of Boolean expression we would use in a where clause. This will only join rows from each table that can be related between the two columns. It also doesn't work if the columns are different types.

9. select_from() example

Imagine that we want to determine the total population in 2008 that belongs to the East South Central division of the census; the population and location live in different tables; however, this time I have removed the defined relationship between the census and state_fact tables so we can practice working with tables in that manner. We begin by selecting the sum of the pop2000 column from the census table. Next we append a select_from to include the join clause. This time in the join clause we specify the table and a condition that matches rows based on the state column of the census table and the name column of the state_fact table. Finally, we add a where clause to find the records where the census_division_name is East South Central in the state_fact table. Then we execute and print the results.

10. Let's practice!

I know that this was a lot to take in so it's time for you to learn by doing! It's your turn to practice building SQL queries with joins. Enjoy!