Get startedGet started for free

Choosing the right quality metric

A junior engineer at Global Retail Analytics runs the cleaning pipeline on a fresh online_retail upload but accidentally comments out the na.drop(subset=["CustomerID"]) line. The remaining steps - na.fill(), dropDuplicates(), and the invalid-record filter - all run without errors.

Which quality check metric would immediately reveal the mistake?

This exercise is part of the course

Data Transformation with Spark SQL in Databricks

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise