Choosing the right quality metric
A junior engineer at Global Retail Analytics runs the cleaning pipeline on a fresh online_retail upload but accidentally comments out the na.drop(subset=["CustomerID"]) line. The remaining steps - na.fill(), dropDuplicates(), and the invalid-record filter - all run without errors.
Which quality check metric would immediately reveal the mistake?
Este exercício faz parte do curso
Data Transformation with Spark SQL in Databricks
Exercício interativo prático
Transforme a teoria em ação com um de nossos exercícios interativos
Começar o exercício