Drift in hotel booking dataset

In the previous chapter, you calculated the business value and ROC AUC performance for a model that predicts booking cancellations. You noticed a few alerts in the resulting plots, which is why you need to investigate the presence of drift in the analysis data.

In this exercise, you will initialize the multivariate drift detection method and compare its results with the performance results calculated in the previous chapter.

StandardDeviationThreshold is already imported along with business value, and ROC AUC results stored in the perf_results variable and feature_column_names are already defined.

This exercise is part of the course

Monitoring Machine Learning in Python

Exercise instructions

Initialize the StandardDeviationThreshold method and set std_lower_multiplier to 2 and std_upper_multiplier parameters to 1.
Add the following feature names country, lead_time, parking_spaces, and hotel. Retain their order.
Pass previously defined thresholds and feature names to the DataReconstructionDriftCalculator.
Show the comparison plot featuring both the multivariate drift detection results(mv_results) and the performance results(perf_results).

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create standard deviation thresholds
stdt = StandardDeviationThreshold(____=____, ____=____)

# Define feature columns
feature_column_names = [____, ____, ____, ____]

# Intialize, fit, and show results of multivariate drift calculator
mv_calc = nannyml.DataReconstructionDriftCalculator(
    column_names=____,
	threshold = ____,
    timestamp_column_name='timestamp',
    chunk_period='m')
mv_calc.fit(reference)
mv_results = mv_calc.calculate(analysis)
mv_results.filter(period='analysis').____(____).plot().show()

Edit and Run Code

This exercise is part of the course

Monitoring Machine Learning in Python

AdvancedSkill Level

4.8+

Start Course for Free

In this chapter, you will be introduced to the NannyML library and its fundamental functions. Initially, you will learn the process of preparing raw data to create reference and analysis sets ready for production monitoring. As a practical example, you will investigate predicting the tip amount for taxi rides in New York. Toward the end of the chapter, you will also discover how to estimate the performance of the tip prediction model using NannyML.

Exercise 1: What is NannyML?Exercise 2: Key features of NannyML Exercise 3: Load the dataset Exercise 4: Data preparation for NannyML Exercise 5: Reference or analysis period?Exercise 6: Loading and splitting the data Exercise 7: Creating reference and analysis set Exercise 8: Performance estimation Exercise 9: Specify the algorithm and problem type Exercise 10: Interpreting results Exercise 11: CBPE and DLE workflow Exercise 12: Performance estimation for tip prediction

In this chapter, you will be introduced to realized performance calculators used when ground truth becomes available. You will learn about the more advanced methods for handling results, including filtering, plotting, converting them to data frames, chunking, and establishing custom thresholds. Lastly, you'll apply this knowledge to calculate the business value of a model trained on the hotel booking dataset.

Exercise 1: When labels are available Exercise 2: When performance estimation is off Exercise 3: Comparing estimated and realized performance Exercise 4: Working with calculated and estimated results Exercise 5: Different chunking methods Exercise 6: Modifying the thresholds Exercise 7: Interacting with results Exercise 8: Business value calculation and estimation Exercise 9: Business value calculation Exercise 10: Drop in monetary value Exercise 11: Business calculation for hotel booking dataset

Having detected the performance degradation in the hotel booking model, you will now learn how to identify the underlying issue causing it. In this chapter, you will be introduced to multivariate and univariate drift detection methods. You will also learn how to identify data quality issues and how to address the underlying problems you detect.

Exercise 1: Multivariate drift detection Exercise 2: Identifying relevant drifts Exercise 3: Drift in hotel booking dataset

Current Exercise

Exercise 4: Univariate drift detection Exercise 5: Univariate drift detection for hotel booking dataset Exercise 6: Ranking the univariate results Exercise 7: Visualizing drifting features Exercise 8: Data quality and statistic checks Exercise 9: Data quality checks Exercise 10: Summary statistics Exercise 11: Issue resolution Exercise 12: What is the resolution?Exercise 13: Should you do nothing or not?Exercise 14: Implementing a monitoring workflow Exercise 15: Congratulations