Get startedGet started for free

Model stacking II

OK, what you've done so far in the stacking implementation:

  1. Split train data into two parts
  2. Train multiple models on Part 1
  3. Make predictions on Part 2
  4. Make predictions on the test data

Now, your goal is to create a second level model using predictions from steps 3 and 4 as features. So, this model is trained on Part 2 data and then you can make stacking predictions on the test data.

part_2 and test DataFrames are already available in your workspace. Gradient Boosting and Random Forest predictions are stored in these DataFrames under the names "gb_pred" and "rf_pred", respectively.

This exercise is part of the course

Winning a Kaggle Competition in Python

View Course

Exercise instructions

  • Train a Linear Regression model on the Part 2 data using Gradient Boosting and Random Forest models predictions as features.
  • Make predictions on the test data using Gradient Boosting and Random Forest models predictions as features.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

from sklearn.linear_model import LinearRegression

# Create linear regression model without the intercept
lr = LinearRegression(fit_intercept=False)

# Train 2nd level model on the Part 2 data
lr.____(part_2[['gb_pred', '____']], part_2.fare_amount)

# Make stacking predictions on the test data
test['stacking'] = lr.____(test[['gb_pred', '____']])

# Look at the model coefficients
print(lr.coef_)
Edit and Run Code