Creating categories with conditional logic

1. Creating categories with conditional logic

Sometimes, the most interesting insights come from categorizing data in new ways.

2. The CASE for conditional logic

Data rarely comes in neat categories or buckets. We have exact durations like 123 minutes and 151 minutes. Analysis is often clearer with labels such as "Short," "Standard," or "Epic", which make patterns easier to see and share. Filtering by duration, such as with conditional logic like "duration greater than 90 minutes," only gives us a yes-or-no answer. Either a film qualified or it didn't. We can take conditional logic further, transforming these numbers into meaningful groups that are easier to read and communicate.

3. Poor prompting for CASE statements

For example, when we prompt "Categorize films as Short if the duration is under 90 minutes, Standard if under 150, and Epic otherwise," the AI builds logic that evaluates each film using CASE statements. It checks conditions in sequence, assigning the first matching label. While this appears to work, this prompt puts us at risk for errors.

4. Poor prompting for CASE statements

This query uses the same prompt, but the generated conditions have been flipped. It might still appear correct, but CASE stops at the first condition that matches, meaning a 75-minute film might be incorrectly categorized as "Standard" because it's also under 150 minutes.

5. Better prompting for CASE statements

A clearer prompt, such as "Categorize films as Short if under 90 minutes, Standard if between 90 and 149 minutes, Epic if 150 minutes or more, and unknown otherwise," helps to avoid these mistakes by being specific about ranges.

6. Verify the logic

We can verify CASE logic by including the fields we categorize, as we have done with duration. This lets us check films near the boundaries, such as 151 minutes, are labeled as expected. Including the original field also makes it easy to see if missing values or unusual entries have been placed into the right group.

7. Filtering NULLs

Missing values or unexpected entries need special attention. In previous our example, these entries were put into the Unknown category. If we don't want these in an Unknown category, we can improve our prompt further or iterate with a second prompt to include a filter for these values. Since we already generated the previous query, we can prompt again to "Remove any NULL durations." Always remember to check the results and adjust if needed.

8. The benefits of categories

Categories help clarify our results. Instead of wrestling with 147 different duration values, we can work with three clear groups. These meaningful labels make it easier to communicate our findings. Saying "epic films generate higher revenue" is more impactful than referencing specific minute ranges. As you progress your skills, you'll find that some visualizations become much cleaner and more compelling when built around these logical categories.

9. Let's practice!

It's time to head to the exercises. After that, we'll look at combining all of the techniques we've learned into one query and prompt.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.