Get startedGet started for free

Limits to cross-validation testing

You can specify very large numbers for both nfold and num_boost_round if you want to perform an extreme amount of cross-validation. The data frame cv_results_big has already been loaded in the workspace and was created with the following code:

cv = xgb.cv(params, DTrain, num_boost_round = 600, nfold=10,
            shuffle = True)

Here, cv() performed 600 iterations of cross-validation! The parameter shuffle tells the function to shuffle the records each time.

Have a look at this data to see what the AUC are, and check to see if they reach 1.0 using cross validation. You should also plot the test AUC score to see the progression.

The data frame cv_results_big has been loaded into the workspace.

This exercise is part of the course

Credit Risk Modeling in Python

View Course

Exercise instructions

  • Print the first five rows of the CV results data frame.
  • Print the average of the test set AUC from the CV results data frame rounded to two places.
  • Plot a line plot of the test set AUC over the course of each iteration.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Print the first five rows of the CV results data frame
print(____.____())

# Calculate the mean of the test AUC scores
print(np.____(____[____]).round(2))

# Plot the test AUC scores for each iteration
plt.____(____[____])
plt.title('Test AUC Score Over 600 Iterations')
plt.xlabel('Iteration Number')
plt.ylabel('Test AUC Score')
plt.____()
Edit and Run Code