Choosing the right quality metric
A junior engineer at Global Retail Analytics runs the cleaning pipeline on a fresh online_retail upload but accidentally comments out the na.drop(subset=["CustomerID"]) line. The remaining steps - na.fill(), dropDuplicates(), and the invalid-record filter - all run without errors.
Which quality check metric would immediately reveal the mistake?
This exercise is part of the course
Data Transformation with Spark SQL in Databricks
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
Start Exercise