Get startedGet started for free

Using regularization in XGBoost

Having seen an example of l1 regularization in the video, you'll now vary the l2 regularization penalty - also known as "lambda" - and see its effect on overall model performance on the Ames housing dataset.

This exercise is part of the course

Extreme Gradient Boosting with XGBoost

View Course

Exercise instructions

  • Create your DMatrix from X and y as before.
  • Create an initial parameter dictionary specifying an "objective" of "reg:squarederror" and "max_depth" of 3.
  • Use xgb.cv() inside of a for loop and systematically vary the "lambda" value by passing in the current l2 value (reg).
  • Append the "test-rmse-mean" from the last boosting round for each cross-validated xgboost model.
  • Hit 'Submit Answer' to view the results. What do you notice?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

reg_params = [1, 10, 100]

# Create the initial parameter dictionary for varying l2 strength: params
params = {"____":"____","____":____}

# Create an empty list for storing rmses as a function of l2 complexity
rmses_l2 = []

# Iterate over reg_params
for reg in reg_params:

    # Update l2 strength
    params["lambda"] = ____
    
    # Pass this updated param dictionary into cv
    cv_results_rmse = ____.____(dtrain=____, params=____, nfold=2, num_boost_round=5, metrics="rmse", as_pandas=True, seed=123)
    
    # Append best rmse (final round) to rmses_l2
    ____.____(____["____"].tail(1).values[0])

# Look at best rmse per l2 param
print("Best rmse as a function of l2:")
print(pd.DataFrame(list(zip(reg_params, rmses_l2)), columns=["l2", "rmse"]))
Edit and Run Code