Session Ready
Exercise

Cross-validation with shuffling

As you'll recall, cross-validation is the process of splitting your data into training and test sets multiple times. Each time you do this, you choose a different training and test set. In this exercise, you'll perform a traditional ShuffleSplit cross-validation on the company value data from earlier. Later we'll cover what changes need to be made for time series data. The data we'll use is the same historical price data for several large companies.

An instance of the Linear regression object (model) is available in your workspace along with the function r2_score() for scoring. Also, the data is stored in arrays X and y. We've also provided a helper function (visualize_predictions()) to help visualize the results.

Instructions
100 XP
  • Initialize a ShuffleSplit cross-validation object with 10 splits.
  • Iterate through CV splits using this object. On each iteration:
    • Fit a model using the training indices.
    • Generate predictions using the test indices, score the model (\(R^2\)) using the predictions, and collect the results.