Performing data validation
Now that you've defined the schema, it's time to perform data validation. In this exercise, you'll create validation rules to ensure data quality and check for common issues like duplicates and null values.
The table_schema from the previous exercise is preloaded for you, along with the ts DataFrame and pointblank library.
Este exercício faz parte do curso
Designing Forecasting Pipelines for Production
Instruções do exercício
- Define validation using the right method and passing the
tsDataFrame. - Set up validation rules with the
table_schemaand check for duplicates. - Print the validation report.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Define the validation
validation = (pb.____(data=____,
tbl_name="US48 Data Validation",
label="Data Refresh",
thresholds=pb.Thresholds(warning=0.2, error=0, critical=0.1))
# Set up the validation rules
.col_schema_match(schema=____)
.col_vals_gt(columns="value", value=0)
.col_vals_in_set(columns="respondent", set = ["US48"])
.col_vals_in_set(columns="type", set = ["D"])
.col_vals_not_null(columns=["period", "value"])
.____()
.interrogate())
# Print the validation report
print(validation.____())