1. Performance estimation
In this video, I will guide you through the process of evaluating your model performance in the absence of ground truth.
2. The algorithms
You might recall the two performance estimation algorithms also called estimators, CBPE or confidence-based performance estimation and DLE, or direct loss estimation from the previous course on monitoring machine learning models in production. If not, it's okay we will do a quick recap but with some code implementations.
3. Direct loss estimation
Direct Loss Estimation, also known as DLE, is used for regression tasks. The idea behind DLE is to train an extra ML model to estimate the value of the loss function, which is the difference between the model's predictions and the actual target values. Under the hood, NannyML uses the LGBM algorithm as an "extra" ML model. It is trained on the reference set and then fed with an analysis set in production, producing estimated performance.
NannyML supports six regression metrics: MAE, MAPE, MSE, RMSE, MSLE, and RMSLE.
4. DLE - code implementation
When implementing DLE in practice, there are several important parameters to set:
y-underscore-true, represents the column containing the actual ground truth values.
y-underscore-pred: Here we pass the predictions generated by the monitored model.
metrics: a list of technical metrics we wish to compute.
timestamp-underscore-column: Specify the column that contains timestamps.
chunk-underscore-period: Chunk is a single data point in the monitoring results. It's set to 'd,' resulting in aggregated daily performance evaluation.
Additionally, we need to provide a list of column names representing the features used by the model.
There's an optional parameter called tune-underscore-hyperparameters, which, when set to True, triggers hyperparameter tuning for the external model. By default, it's set to False due to its higher computational demands.
Regarding how NannyML's algorithms operate, they follow a similar approach to scikit-learn. We "fit" it using the reference set and then estimate our metrics on the analysis set. The results are stored in a NannyML results object, which can be converted to a Pandas DataFrame or plotted.
5. Confidence based performance estimation
Now, let's take a look at CBPE, which is a method used for both binary and multiclass classification problems. It works by using the confidence scores of the model's predictions. With these scores, CBPE estimates all the elements of the confusion matrix. This allows us to estimate various classification performance metrics such as accuracy, ROC AUC, F1 score, or precision.
6. CBPE - code implementation
Now, let's set up the Confidence based Performance Estimation algorithm. Many parameters are the same as in the DLE algorithm, with a few differences.
First, we introduce y-underscore-pred-underscore-proba, which holds the predicted probabilities, while y-underscore-pred contains the predicted classes.
Lastly, we specify the problem-type, indicating whether it's a binary or multiclass classification problem.
Similar to the DLE algorithm, we first fit the reference set to establish a baseline and then estimate the results for the analysis set.
7. Results
In our example, we visualize the results by using the "plot" and "show" methods, which work the same for DLE and CBPE algorithms.
The resulting graph consists of several important components:
The X-axis represents a timeline with daily values for each chunk.
The Y-axis displays the values of the technical metrics we're monitoring.
In the graph, a purple dashed step line represents the estimated performance, and the blue area surrounding it represents the sampling error. It tells us how much the actual performance might differ from the expected performance due to sampling effects.
Lastly, the red dotted lines serve as thresholds. These thresholds help alert us when the performance exceeds them.
8. Let's practice!
Now, practice what you've learned