Get startedGet started for free

Real-world cost analysis

You will still work on the credit dataset for this exercise. Recall that a "positive" in this dataset means "bad credit", i.e., a customer who defaulted on their loan, and a "negative" means a customer who continued to pay without problems. The bank manager informed you that the bank makes 10K profit on average from each "good risk" customer, but loses 150K from each "bad risk" customer. Your algorithm will be used to screen applicants, so those that are labeled as "negative" will be given a loan, and the "positive" ones will be turned down. What is the total cost of your classifier? The data is available as X_train, X_test, y_train and y_test. The functions confusion_matrix(), f1_score(), and precision_score() and RandomForestClassifier() are available.

This exercise is part of the course

Designing Machine Learning Workflows in Python

View Course

Exercise instructions

  • Fit a random forest classifier to the training data.
  • Use it to label the test data.
  • Extract the false negatives and false positives from confusion_matrix(). You will have to flatten the matrix.
  • Falsely classifying a "good" customer as "bad" means that the bank would have lost the chance to make 10K profit. Falsely classifying a "bad" customer as "good" means that the bank would have lost 150K due to the customer defaulting on their loan.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit a random forest classifier to the training data
clf = ____(random_state=2).fit(____, ____)

# Label the test data
preds = clf.____(____)

# Get false positives/negatives from the confusion matrix
tn, ____, ____, tp = confusion_matrix(y_test, preds).____()

# Now compute the cost using the manager's advice
cost = fp*____ + fn*____
Edit and Run Code