Explore overfitting XGBoost

Having trained 3 XGBoost models with different maximum depths, you will now evaluate their quality. For this purpose, you will measure the quality of each model on both the train data and the test data. As you know by now, the train data is the data models have been trained on. The test data is the next month sales data that models have never seen before.

The goal of this exercise is to determine whether any of the models trained is overfitting. To measure the quality of the models you will use Mean Squared Error (MSE). It's available in sklearn.metrics as mean_squared_error() function that takes two arguments: true values and predicted values.

train and test DataFrames together with 3 models trained (xg_depth_2, xg_depth_8, xg_depth_15) are available in your workspace.

Cet exercice fait partie du cours

Winning a Kaggle Competition in Python

Afficher le cours

Instructions

Make predictions for each model on both the train and test data.
Calculate the MSE between the true values and your predictions for both the train and test data.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

from sklearn.metrics import mean_squared_error

dtrain = xgb.DMatrix(data=train[['store', 'item']])
dtest = xgb.DMatrix(data=test[['store', 'item']])

# For each of 3 trained models
for model in [xg_depth_2, xg_depth_8, xg_depth_15]:
    # Make predictions
    train_pred = model.____(dtrain)     
    test_pred = model.____(dtest)          
    
    # Calculate metrics
    mse_train = ____(train['sales'], train_pred)                  
    mse_test = ____(test['sales'], test_pred)
    print('MSE Train: {:.3f}. MSE Test: {:.3f}'.format(mse_train, mse_test))

Modifier et exécuter le code