Using regularization in XGBoost
Having seen an example of l1 regularization in the video, you'll now vary the l2 regularization penalty - also known as "lambda"
- and see its effect on overall model performance on the Ames housing dataset.
This exercise is part of the course
Extreme Gradient Boosting with XGBoost
Exercise instructions
- Create your
DMatrix
fromX
andy
as before. - Create an initial parameter dictionary specifying an
"objective"
of"reg:squarederror"
and"max_depth"
of3
. - Use
xgb.cv()
inside of afor
loop and systematically vary the"lambda"
value by passing in the current l2 value (reg
). - Append the
"test-rmse-mean"
from the last boosting round for each cross-validatedxgboost
model. - Hit 'Submit Answer' to view the results. What do you notice?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)
reg_params = [1, 10, 100]
# Create the initial parameter dictionary for varying l2 strength: params
params = {"____":"____","____":____}
# Create an empty list for storing rmses as a function of l2 complexity
rmses_l2 = []
# Iterate over reg_params
for reg in reg_params:
# Update l2 strength
params["lambda"] = ____
# Pass this updated param dictionary into cv
cv_results_rmse = ____.____(dtrain=____, params=____, nfold=2, num_boost_round=5, metrics="rmse", as_pandas=True, seed=123)
# Append best rmse (final round) to rmses_l2
____.____(____["____"].tail(1).values[0])
# Look at best rmse per l2 param
print("Best rmse as a function of l2:")
print(pd.DataFrame(list(zip(reg_params, rmses_l2)), columns=["l2", "rmse"]))