CommencerCommencer gratuitement

Drift in hotel booking dataset

In the previous chapter, you calculated the business value and ROC AUC performance for a model that predicts booking cancellations. You noticed a few alerts in the resulting plots, which is why you need to investigate the presence of drift in the analysis data.

In this exercise, you will initialize the multivariate drift detection method and compare its results with the performance results calculated in the previous chapter.

StandardDeviationThreshold is already imported along with business value, and ROC AUC results stored in the perf_results variable and feature_column_names are already defined.

Cet exercice fait partie du cours

Monitoring Machine Learning in Python

Afficher le cours

Instructions

  • Initialize the StandardDeviationThreshold method and set std_lower_multiplier to 2 and std_upper_multiplier parameters to 1.
  • Add the following feature names country, lead_time, parking_spaces, and hotel. Retain their order.
  • Pass previously defined thresholds and feature names to the DataReconstructionDriftCalculator.
  • Show the comparison plot featuring both the multivariate drift detection results(mv_results) and the performance results(perf_results).

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create standard deviation thresholds
stdt = StandardDeviationThreshold(____=____, ____=____)

# Define feature columns
feature_column_names = [____, ____, ____, ____]

# Intialize, fit, and show results of multivariate drift calculator
mv_calc = nannyml.DataReconstructionDriftCalculator(
    column_names=____,
	threshold = ____,
    timestamp_column_name='timestamp',
    chunk_period='m')
mv_calc.fit(reference)
mv_results = mv_calc.calculate(analysis)
mv_results.filter(period='analysis').____(____).plot().show()
Modifier et exécuter le code