Dealing with nulls
1. Dealing with nulls
Another potential cause of messy data comes in the form of null values.2. What does null really mean?
A null means no value exists. But conceptually, a null can represent multiple things. For example, let's say we have this table of orders and see null for one order's price per unit. What does this mean? Has the order started but has yet to go through? Could it mean this order was bought for free? Or perhaps it means this order was bought at a flat price instead of a by unit price. In order to truly understand what a null represents, you may need to explore the dataset further or gain context from a business perspective.3. Issues with nulls
Although some nulls are legitimate, they can cause issues with data. For example, consider this soccer dataset, which shows goals scored by game. In this case, null represents no goals. If we want to see total goals per game, simply add the home and away fields, right?4. Issues with nulls
Not quite. SQL is unable to add an integer with a null, so the last two games output null values.5. Issues with nulls
Another issue arises when nulls appear as a result of our query. For example, let's pull total athletes by region using this query. The results include athletes for the region null. This occurs because some athletes act independently, not as part of a country, and therefore do not count as part of a region. However, this is unclear on the report, so we should label the null value intuitively.6. Fix 1: Filtering nulls
How exactly can we fix these issues? One approach is to completely filter out the nulls. This does not work in all situations, as it may filter out important rows, but if nulls are throwing off your query, you can filter out null values by using a WHERE IS NOT NULL statement.7. Fix 1: Filtering nulls
This example filters out the null row.8. Fix 2: COALESCE()
A second approach is to use the COALESCE function. This function allows you to change any null values to the output of your choice. The second argument in this function states what to replace null values with. For example, if we coalesce the region field with the string independent athletes, it relabels the null region. This approach makes it more intuitive for others to understand the report.9. Fix 2: COALESCE()
You can also use COALESCE on numerical values, which is useful when nulls should represent a value of zero. To go back to our first issue, we can add COALESCE to both the home and away fields, replacing null values with zeros.10. Fix 2: COALESCE()
Now, since SQL reads all nulls as 0, we can add the two fields without any issues.11. Nulls as a result of a query
Note that nulls may appear as a result of your query. This can happen for a variety of reasons, such as when a LEFT or RIGHT join does not map to all values or when a CASE statement conditional is not satisfied. There are other causes as well.12. Measuring the impact of nulls
If you notice nulls, it's worth understanding how widespread the nulls are. To identify how much of your report consists of nulls, you can create a ratio that compares null values to total values. The top approach shown here outputs percent of rows that are null, while the bottom approach outputs percent of revenue that is null. You can use this approach directly on a table, or on your final report by layering this line outside of your query.13. Practice time!
So that's a summary of why nulls can be an issue as well as how to deal with them. Let's get some practice fixing these null issues.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.