Baseline
Evaluating a classifier relative to an appropriate baseline is important. This is especially true for imbalanced datasets, such as ad click-through, because high accuracy can easily be achieved through always selecting the majority class. In this exercise, you will simulate a baseline classifier that always predicts the majority class (non-click) and look at its confusion matrix, as well as what its precision and recall are.
X_train
, y_train
, X_test
, y_test
are available in your workspace. pandas
as pd
, numpy
as np
, and sklearn
are also available in your workspace.
This exercise is part of the course
Predicting CTR with Machine Learning in Python
Exercise instructions
- Create
y_pred
an array of zeros with the same length asX_test
usingnp.asarray()
. - Print the resulting confusion matrix.
- Get the precision and recall scores.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Set up baseline predictions
y_pred = np.____([0 for x in range(len(X_test))])
# Look at confusion matrix
print("Confusion matrix: ")
print(____(y_test, y_pred))
# Check precision and recall
prec = ____(y_test, y_pred, average = 'weighted')
recall = ____(y_test, y_pred, average = 'weighted')
print("Precision: %s, Recall: %s" %(prec, recall))