Data quality checks
As you learned in the previous video, missing values can result in a loss of valuable information and potentially lead to incorrect interpretations. Similarly, the presence of unseen values can also affect your model's confidence.
In this exercise, your goal is to explore whether the hotel booking dataset contains missing values and identify any unseen values. The reference and analysis datasets are already loaded, along with the nannyml
library.
A quick reminder, if you can't recall the column types, you can easily explore the data using the .head()
method.
Este exercício faz parte do curso
Monitoring Machine Learning in Python
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Define analyzed columns
selected_columns = ['country', 'lead_time', 'parking_spaces', 'hotel']
# Intialize missing values calculator
ms_calc = ____.____(
____=____,
____=____,
timestamp_column_name='timestamp'
)
# Fit, calculate and plot the results
ms_calc.fit(reference)
ms_results = ms_calc.calculate(analysis)
ms_results.plot().show()