1. Joins in BigQuery
Joining multiple datasets together is an essential tool for creating analytical reports and datasets in BigQuery. In this section, we will review the join types available to you in BigQuery.
2. Joining data in BigQuery
BigQuery supports the same basic join types - inner, left, right, full, and cross - as most relational databases. This familiarity allows users to transition their existing knowledge to BigQuery easily. The core syntax for joins in BigQuery is similar to other SQL-based databases. We have already seen some basic joins in this course, so now we are just introducing new join types.
3. Real life examples of joins
Here, we can imagine some scenarios for different joins with two tables. The first is a table with information about our customers and the second is a table about their orders. Both tables have a key that can be used for joins. Inner joins allow us to only return matching results from both tables. A left join will return all customers, even if they did not order anything. Right joins would return all orders, even orders with no customer ID. A full join will return all customers and orders, even if some have not interacted. Finally, a cross join will join all rows to each other regardless of any matching conditions, better known as a cartesian join.
4. INNER JOIN
An inner join in BigQuery combines data from two tables based on a shared column. It only includes rows where matching values exist in both tables. This makes it ideal for scenarios where you must find correlated data points across different datasets. In this example, one dataset contains customer information and another about order names - the "sales-data" table. This query joins the datasets based on the "customer-id" and returns the only matching results in both datasets. Note that the INNER keyword is optional in BigQuery, and using JOIN alone will accomplish the same result.
5. LEFT JOIN
In this example, we are creating a left join, which will return all results from the left customer's table, even if there are no matches. You can see this in the results with the customer with "customer-id" number 3. The null result indicates that they have yet to place an order.
6. RIGHT JOIN
In this example, we are creating a right join, which will return all results from the right, or "sales-data" table, even if there are no matches. Here, we can see a "customer-id" of null with a product they have purchased.
7. OUTER JOIN
The best way to think of an outer join is as a "right-left" join, meaning that all results of a right and left join will be included. Here, we can see that the results from our left join, or the customer with "customer-id" number 3, appear as does the result from our right join of a guest customer ordering an external microphone. Outer joins are used for cases like this, where we want to capture matches that may not match the join condition but are relevant for an analysis.
8. SELF or CROSS JOIN
A cross join, also known as a cartesian join, joins every row from one table to every row in the other table, ordered by the rows in the left table. For example, if you have two tables with ten rows each, the result of the join will be 100 rows (10 times 10). While you can use a WHERE statement with a cross join, the cross join takes no join condition, and you can simply "join" the tables by listing both after the FROM operator, separated by a comma.
9. Joins and UNNEST
We also use a cross join to join unnested data, such as this example, where we are unnesting an array containing payment methods for each customer.
10. Let's practice!
Let's use our sample data to test out a few join scenarios!