Limits to cross-validation testing

You can specify very large numbers for both nfold and num_boost_round if you want to perform an extreme amount of cross-validation. The data frame cv_results_big has already been loaded in the workspace and was created with the following code:

cv = xgb.cv(params, DTrain, num_boost_round = 600, nfold=10,
            shuffle = True)

Here, cv() performed 600 iterations of cross-validation! The parameter shuffle tells the function to shuffle the records each time.

Have a look at this data to see what the AUC are, and check to see if they reach 1.0 using cross validation. You should also plot the test AUC score to see the progression.

The data frame cv_results_big has been loaded into the workspace.

Cet exercice fait partie du cours

Credit Risk Modeling in Python

Afficher le cours

Instructions

Print the first five rows of the CV results data frame.
Print the average of the test set AUC from the CV results data frame rounded to two places.
Plot a line plot of the test set AUC over the course of each iteration.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Print the first five rows of the CV results data frame
print(____.____())

# Calculate the mean of the test AUC scores
print(np.____(____[____]).round(2))

# Plot the test AUC scores for each iteration
plt.____(____[____])
plt.title('Test AUC Score Over 600 Iterations')
plt.xlabel('Iteration Number')
plt.ylabel('Test AUC Score')
plt.____()

Modifier et exécuter le code