Exercise

# Train/test split for regression

As you learned in Chapter 1, train and test sets are vital to ensure that your supervised learning model is able to generalize well to new data. This was true for classification models, and is equally true for linear regression models.

In this exercise, you will split the Gapminder dataset into training and testing sets, and then fit and predict a linear regression over **all** features. In addition to computing the \(R^2\) score, you will also compute the Root Mean Squared Error (RMSE), which is another commonly used metric to evaluate regression models. The feature array `X`

and target variable array `y`

have been pre-loaded for you from the DataFrame `df`

.

Instructions

**100 XP**

- Import
`LinearRegression`

from`sklearn.linear_model`

,`mean_squared_error`

from`sklearn.metrics`

, and`train_test_split`

from`sklearn.model_selection`

. - Using
`X`

and`y`

, create training and test sets such that 30% is used for testing and 70% for training. Use a random state of`42`

. - Create a linear regression regressor called
`reg_all`

, fit it to the training set, and evaluate it on the test set. - Compute and print the \(R^2\) score using the
`.score()`

method on the test set. - Compute and print the RMSE. To do this, first compute the Mean Squared Error using the
`mean_squared_error()`

function with the arguments`y_test`

and`y_pred`

, and then take its square root using`np.sqrt()`

.