1. Learn
  2. /
  3. Courses
  4. /
  5. Machine Learning with PySpark

Exercise

Flight duration model: Regularization!

In the previous exercise you added more predictors to the flight duration model. The model performed well on testing data, but with so many coefficients it was difficult to interpret.

In this exercise you'll use Lasso regression (regularized with a L1 penalty) to create a more parsimonious model. Many of the coefficients in the resulting model will be set to zero. This means that only a subset of the predictors actually contribute to the model. Despite the simpler model, it still produces a good RMSE on the testing data.

You'll use a specific value for the regularization strength. Later you'll learn how to find the best value using cross validation.

The data (same as previous exercise) are available as flights, randomly split into flights_train and flights_test.

There are two parameters for this model, λ (regParam) and α (elasticNetParam), where α determines the type of regularization and λ gives the strength of regularization.

Instructions

100 XP
  • Fit a linear regression model to the training data. Set the regularization strength to 1.
  • Calculate the RMSE on the testing data.
  • Look at the model coefficients.
  • How many of the coefficients are equal to zero?