Data quality checks
As you learned in the previous video, missing values can result in a loss of valuable information and potentially lead to incorrect interpretations. Similarly, the presence of unseen values can also affect your model's confidence.
In this exercise, your goal is to explore whether the hotel booking dataset contains missing values and identify any unseen values. The reference and analysis datasets are already loaded, along with the nannyml
library.
A quick reminder, if you can't recall the column types, you can easily explore the data using the .head()
method.
This exercise is part of the course
Monitoring Machine Learning in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define analyzed columns
selected_columns = ['country', 'lead_time', 'parking_spaces', 'hotel']
# Intialize missing values calculator
ms_calc = ____.____(
____=____,
____=____,
timestamp_column_name='timestamp'
)
# Fit, calculate and plot the results
ms_calc.fit(reference)
ms_results = ms_calc.calculate(analysis)
ms_results.plot().show()