Session Ready
Exercise

Limits to cross-validation testing

You can specify very large numbers for both nfold and num_boost_round if you want to perform an extreme amount of cross-validation. The data frame cv_results_big has already been loaded in the workspace and was created with the following code:

cv = xgb.cv(params, DTrain, num_boost_round = 600, nfold=10,
            shuffle = True)

Here, cv() performed 600 iterations of cross-validation! The parameter shuffle tells the function to shuffle the records each time.

Have a look at this data to see what the AUC are, and check to see if they reach 1.0 using cross validation. You should also plot the test AUC score to see the progression.

The data frame cv_results_big has been loaded into the workspace.

Instructions
100 XP
  • Print the first five rows of the CV results data frame.
  • Print the average of the test set AUC from the CV results data frame rounded to two places.
  • Plot a line plot of the test set AUC over the course of each iteration.