Exercise

Tuning colsample_bytree

Now, it's time to tune "colsample_bytree". You've already seen this if you've ever worked with scikit-learn's RandomForestClassifier or RandomForestRegressor, where it just was called max_features. In both xgboost and sklearn, this parameter (although named differently) simply specifies the fraction of features to choose from at every split in a given tree. In xgboost, colsample_bytree must be specified as a float between 0 and 1.

Instructions

100 XP
  • Create a list called colsample_bytree_vals to store the values 0.1, 0.5, 0.8, and 1.
  • Systematically vary "colsample_bytree" and perform cross-validation, exactly as you did with max_depth and eta previously.