Data quality checks
As you learned in the previous video, missing values can result in a loss of valuable information and potentially lead to incorrect interpretations. Similarly, the presence of unseen values can also affect your model's confidence.
In this exercise, your goal is to explore whether the hotel booking dataset contains missing values and identify any unseen values. The reference and analysis datasets are already loaded, along with the nannyml library.
A quick reminder, if you can't recall the column types, you can easily explore the data using the .head() method.
Deze oefening maakt deel uit van de cursus
Monitoring Machine Learning in Python
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Define analyzed columns
selected_columns = ['country', 'lead_time', 'parking_spaces', 'hotel']
# Intialize missing values calculator
ms_calc = ____.____(
____=____,
____=____,
timestamp_column_name='timestamp'
)
# Fit, calculate and plot the results
ms_calc.fit(reference)
ms_results = ms_calc.calculate(analysis)
ms_results.plot().show()