Exercise

Regularization II: Ridge

Lasso is great for feature selection, but when building regression models, Ridge regression should be your first choice.

Recall that lasso performs regularization by adding to the loss function a penalty term of the absolute value of each coefficient multiplied by some alpha. This is also known as \(L1\) regularization because the regularization term is the \(L1\) norm of the coefficients. This is not the only way to regularize, however.

If instead you took the sum of the squared values of the coefficients multiplied by some alpha - like in Ridge regression - you would be computing the \(L2\) norm. In this exercise, you will practice fitting ridge regression models over a range of different alphas, and plot cross-validated \(R^2\) scores for each, using this function that we have defined for you, which plots the \(R^2\) score as well as standard error for each alpha:

def display_plot(cv_scores, cv_scores_std):
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    ax.plot(alpha_space, cv_scores)

    std_error = cv_scores_std / np.sqrt(10)

    ax.fill_between(alpha_space, cv_scores + std_error, cv_scores - std_error, alpha=0.2)
    ax.set_ylabel('CV Score +/- Std Error')
    ax.set_xlabel('Alpha')
    ax.axhline(np.max(cv_scores), linestyle='--', color='.5')
    ax.set_xlim([alpha_space[0], alpha_space[-1]])
    ax.set_xscale('log')
    plt.show()

Don't worry about the specifics of the above function works. The motivation behind this exercise is for you to see how the \(R^2\) score varies with different alphas, and to understand the importance of selecting the right value for alpha. You'll learn how to tune alpha in the next chapter.

Instructions

100 XP
  • Instantiate a Ridge regressor and specify normalize=True.
  • Inside the for loop:
    • Specify the alpha value for the regressor to use.
    • Perform 10-fold cross-validation on the regressor with the specified alpha. The data is available in the arrays X and y.
    • Append the average and the standard deviation of the computed cross-validated scores. NumPy has been pre-imported for you as np.
  • Use the display_plot() function to visualize the scores and standard deviations.