Limits to cross-validation testing
You can specify very large numbers for both nfold
and num_boost_round
if you want to perform an extreme amount of cross-validation. The data frame cv_results_big
has already been loaded in the workspace and was created with the following code:
cv = xgb.cv(params, DTrain, num_boost_round = 600, nfold=10,
shuffle = True)
Here, cv()
performed 600 iterations of cross-validation! The parameter shuffle
tells the function to shuffle the records each time.
Have a look at this data to see what the AUC are, and check to see if they reach 1.0
using cross validation. You should also plot the test AUC score to see the progression.
The data frame cv_results_big
has been loaded into the workspace.
This exercise is part of the course
Credit Risk Modeling in Python
Exercise instructions
- Print the first five rows of the CV results data frame.
- Print the average of the test set AUC from the CV results data frame rounded to two places.
- Plot a line plot of the test set AUC over the course of each iteration.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print the first five rows of the CV results data frame
print(____.____())
# Calculate the mean of the test AUC scores
print(np.____(____[____]).round(2))
# Plot the test AUC scores for each iteration
plt.____(____[____])
plt.title('Test AUC Score Over 600 Iterations')
plt.xlabel('Iteration Number')
plt.ylabel('Test AUC Score')
plt.____()