Load the dataset

NannyML comes with a set of internal datasets in order to make it easier to demo use cases and test different algorithms. To load the dataset, you only need to use the nannyml.load_us_census_ma_employment_data() function.

The function returns three Pandas DataFrame objects: the reference set (the test set), the analysis set (unseen production data), and the ground truth for the analysis set. These data frames should be named according to the convention as reference, analysis, and analysis_gt.

In this exercise, you will load the US Census Employment dataset and print the data frames to understand what they look like.

This exercise is part of the course

Monitoring Machine Learning in Python

Exercise instructions

Import the nannyml libary.
Load the US Census Employment dataset from the nannyml library.
Print the head of the reference data.
Print the head of the analysis data.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import nannyml
import ____

# Load US Census Employment dataset
____, ____, ____ = ____.____()

# Print head of the reference data
____

# Print head of the analysis data
____

Edit and Run Code

This exercise is part of the course

Monitoring Machine Learning in Python

AdvancedSkill Level

4.8+

Start Course for Free

In this chapter, you will be introduced to the NannyML library and its fundamental functions. Initially, you will learn the process of preparing raw data to create reference and analysis sets ready for production monitoring. As a practical example, you will investigate predicting the tip amount for taxi rides in New York. Toward the end of the chapter, you will also discover how to estimate the performance of the tip prediction model using NannyML.

Exercise 1: What is NannyML?Exercise 2: Key features of NannyML Exercise 3: Load the dataset

Current Exercise

Exercise 4: Data preparation for NannyML Exercise 5: Reference or analysis period?Exercise 6: Loading and splitting the data Exercise 7: Creating reference and analysis set Exercise 8: Performance estimation Exercise 9: Specify the algorithm and problem type Exercise 10: Interpreting results Exercise 11: CBPE and DLE workflow Exercise 12: Performance estimation for tip prediction

In this chapter, you will be introduced to realized performance calculators used when ground truth becomes available. You will learn about the more advanced methods for handling results, including filtering, plotting, converting them to data frames, chunking, and establishing custom thresholds. Lastly, you'll apply this knowledge to calculate the business value of a model trained on the hotel booking dataset.

Exercise 1: When labels are available Exercise 2: When performance estimation is off Exercise 3: Comparing estimated and realized performance Exercise 4: Working with calculated and estimated results Exercise 5: Different chunking methods Exercise 6: Modifying the thresholds Exercise 7: Interacting with results Exercise 8: Business value calculation and estimation Exercise 9: Business value calculation Exercise 10: Drop in monetary value Exercise 11: Business calculation for hotel booking dataset

Having detected the performance degradation in the hotel booking model, you will now learn how to identify the underlying issue causing it. In this chapter, you will be introduced to multivariate and univariate drift detection methods. You will also learn how to identify data quality issues and how to address the underlying problems you detect.

Exercise 1: Multivariate drift detection Exercise 2: Identifying relevant drifts Exercise 3: Drift in hotel booking dataset Exercise 4: Univariate drift detection Exercise 5: Univariate drift detection for hotel booking dataset Exercise 6: Ranking the univariate results Exercise 7: Visualizing drifting features Exercise 8: Data quality and statistic checks Exercise 9: Data quality checks Exercise 10: Summary statistics Exercise 11: Issue resolution Exercise 12: What is the resolution?Exercise 13: Should you do nothing or not?Exercise 14: Implementing a monitoring workflow Exercise 15: Congratulations