Exercise

# Flight duration model: More features!

Let's add more features to our model. This will not necessarily result in a better model. Adding some features might improve the model. Adding other features might make it worse.

More features will *always* make the model more complicated and difficult to interpret.

These are the features you'll include in the next model:

`km`

`org`

(origin airport, one-hot encoded, 8 levels)`depart`

(departure time, binned in 3 hour intervals, one-hot encoded, 8 levels)`dow`

(departure day of week, one-hot encoded, 7 levels) and`mon`

(departure month, one-hot encoded, 12 levels).

These have been assembled into the `features`

column, which is a sparse representation of 32 columns (remember one-hot encoding produces a number of columns which is one fewer than the number of levels).

The data are available as `flights`

, randomly split into `flights_train`

and `flights_test`

. The object `predictions`

is also available.

Instructions

**100 XP**

- Fit a linear regression model to the training data.
- Generate predictions for the testing data.
- Calculate the RMSE on the testing data.
- Look at the model coefficients. Are any of them zero?