Aan de slagGa gratis aan de slag

Performing data validation

Now that you've defined the schema, it's time to perform data validation. In this exercise, you'll create validation rules to ensure data quality and check for common issues like duplicates and null values.

The table_schema from the previous exercise is preloaded for you, along with the ts DataFrame and pointblank library.

Deze oefening maakt deel uit van de cursus

Designing Forecasting Pipelines for Production

Cursus bekijken

Oefeninstructies

  • Define validation using the right method and passing the ts DataFrame.
  • Set up validation rules with the table_schema and check for duplicates.
  • Print the validation report.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Define the validation
validation = (pb.____(data=____,
tbl_name="US48 Data Validation",
label="Data Refresh",
thresholds=pb.Thresholds(warning=0.2, error=0, critical=0.1))
             
    # Set up the validation rules
    .col_schema_match(schema=____)
    .col_vals_gt(columns="value", value=0)
    .col_vals_in_set(columns="respondent", set = ["US48"])
    .col_vals_in_set(columns="type", set = ["D"])
    .col_vals_not_null(columns=["period", "value"])
    .____()
    .interrogate())

# Print the validation report
print(validation.____())
Code bewerken en uitvoeren