ComenzarEmpieza gratis

Baseline based on the gradient boosting

Let's build a final baseline based on the Random Forest. You've seen a huge score improvement moving from the grouping baseline to the Gradient Boosting in the video. Now, you will use sklearn's Random Forest to further improve this score.

The goal of this exercise is to take numeric features and train a Random Forest model without any tuning. After that, you could make test predictions and validate the result on the Public Leaderboard. Note that you've already got an "hour" feature which could also be used as an input to the model.

Este ejercicio forma parte del curso

Winning a Kaggle Competition in Python

Ver curso

Instrucciones del ejercicio

  • Add the "hour" feature to the list of numeric features.
  • Fit the RandomForestRegressor on the train data with numeric features and "fare_amount" as a target.
  • Use the trained Random Forest model to make predictions on the test data.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

from sklearn.ensemble import RandomForestRegressor

# Select only numeric features
features = ['pickup_longitude', 'pickup_latitude', 'dropoff_longitude',
            'dropoff_latitude', 'passenger_count', ____]

# Train a Random Forest model
rf = RandomForestRegressor()
rf.____(train[____], train.fare_amount)

# Make predictions on the test data
test['fare_amount'] = ____.____(test[features])

# Write predictions
test[['id','fare_amount']].to_csv('rf_sub.csv', index=False)
Editar y ejecutar código