Drift in hotel booking dataset
In the previous chapter, you calculated the business value and ROC AUC performance for a model that predicts booking cancellations. You noticed a few alerts in the resulting plots, which is why you need to investigate the presence of drift in the analysis data.
In this exercise, you will initialize the multivariate drift detection method and compare its results with the performance results calculated in the previous chapter.
StandardDeviationThreshold
is already imported along with business value, and ROC AUC results stored in the perf_results
variable and feature_column_names
are already defined.
Diese Übung ist Teil des Kurses
Monitoring Machine Learning in Python
Anleitung zur Übung
- Initialize the
StandardDeviationThreshold
method and setstd_lower_multiplier
to2
andstd_upper_multiplier
parameters to1
. - Add the following feature names
country
,lead_time
,parking_spaces
, andhotel
. Retain their order. - Pass previously defined thresholds and feature names to the
DataReconstructionDriftCalculator
. - Show the comparison plot featuring both the multivariate drift detection results(
mv_results
) and the performance results(perf_results
).
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Create standard deviation thresholds
stdt = StandardDeviationThreshold(____=____, ____=____)
# Define feature columns
feature_column_names = [____, ____, ____, ____]
# Intialize, fit, and show results of multivariate drift calculator
mv_calc = nannyml.DataReconstructionDriftCalculator(
column_names=____,
threshold = ____,
timestamp_column_name='timestamp',
chunk_period='m')
mv_calc.fit(reference)
mv_results = mv_calc.calculate(analysis)
mv_results.filter(period='analysis').____(____).plot().show()