Model blending

You will start creating model ensembles with a blending technique.

Your goal is to train 2 different models on the New York City Taxi competition data. Make predictions on the test data and then blend them using a simple arithmetic mean.

The train and test DataFrames are already available in your workspace. features is a list of columns to be used for training and it is also available in your workspace. The target variable name is "fare_amount".

This exercise is part of the course

Winning a Kaggle Competition in Python

View Course

Exercise instructions

Train a Gradient Boosting model on the train data using features list, and the "fare_amount" column as a target variable.
Train a Random Forest model in the same manner.
Make predictions on the test data using both Gradient Boosting and Random Forest models.
Find the average of both models predictions.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor

# Train a Gradient Boosting model
gb = GradientBoostingRegressor().____(____[features], ____.fare_amount)

# Train a Random Forest model
rf = RandomForestRegressor().____(____[features], ____.fare_amount)

# Make predictions on the test data
test['gb_pred'] = ____.____(test[features])
test['rf_pred'] = ____.____(test[features])

# Find mean of model predictions
test['blend'] = (____[____] + ____[____]) / 2
print(test[['gb_pred', 'rf_pred', 'blend']].head(3))

Edit and Run Code