Intuition check again! Now you've seen the effects of undersampling the training set to improve default prediction. You undersampled the training data set
X_train, and it had a positive impact on the new model's AUC score and recall for defaults. The training data had class imbalance which is normal for most credit loan data.
You did not undersample the test data
X_test. Why not undersample the test set as well?