Presence and absence
1. Presence and absence
A common question that data scientists ask when analyzing SQL databases is whether the data in one table is present or absent in a related table. Fortunately, this is an easy question to answer using SQL. However, depending on the requirements there may be more than one method and some methods may be more efficient than others. In this lesson, we’ll discuss what we mean by, "Is the data present or absent in related tables?" and we’ll look at using INTERSECT and EXCEPT to help us determine this.2. Venn diagram - presence
First, what do we mean by data is present, or absent in related tables? We’ll use Venn diagrams to demonstrate this. Here, the overlapping circles represent the outputs of queries on two tables. Where the outputs intersect is where we will find the same data present in both tables.3. Venn diagram - absence
If we take the portion in the left output that does not intersect the right output, then we say that the data present in the table we query on the left, is absent in the table we query on the right.4. Customer Orders database
Let's apply this concept to our Customer Orders database. Any customers placing orders will have their customer information in the Customers table and their order information in the Orders table. We say these tables are related because they both contain CustomerID. Customers that have not placed any orders will have their customer information in the Customers table but not in the Orders table. If we are working on a sales project, we may ask "Who is ordering and who is not?"5. INTERSECT
We can find out which customers have placed orders by using the INTERSECT operator. We SELECT CustomerID from the Customers table, the left query,6. INTERSECT
add INTERSECT and then SELECT CustomerID from the Orders table, the right query.7. EXCEPT
To check for the absence of customers in the Orders table, that are present in the Customers table,8. EXCEPT
we substitute INTERSECT with EXCEPT.9. INTERSECT and EXCEPT
INTERSECT and EXCEPT are great for data interrogation. One feature they provide is removing duplicates meaning that the results will only contain unique values of data that is present or absent. One disadvantage of INTERSECT and EXCEPT is the number and order of columns in the SELECT statements must be the same between queries. We'll discuss this disadvantage in upcoming lessons.10. Let's practice!
Let's practice using INTERSECT and EXCEPT.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.