Default thresholding
You would like to confirm that the DecisionTreeClassifier()
uses the same default classification threshold as mentioned in the previous lesson, namely 0.5. It seems strange to you that all classifiers should use the same threshold. Let's check! A fitted decision tree classifier clf
has been preloaded for you, as have the training and test data with their usual names: X_train
, X_test
, y_train
and y_test
. You will have to extract probability scores from the classifier using the .predict_proba()
method.
Cet exercice fait partie du cours
Designing Machine Learning Workflows in Python
Instructions
- Produce scores for the test examples, using the preloaded classifier
clf
. - Now extract labels from the scores. Remember that you have a pair of scores for each example, not a single score, and the second element is the probability of the positive class.
- Now label the test data using the standard
.predict()
method - Finally, compare with the predictions you got before. Are they identical?
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Score the test data using the given classifier
scores = clf.____(____)
# Get labels from the scores using the default threshold
preds = [s[____] > ____ for s in scores]
# Use the predict method to label the test data again
preds_default = clf.____(____)
# Compare the two sets of predictions
____(preds == preds_default)