CommencerCommencer gratuitement

Cross-validation with shuffling

As you'll recall, cross-validation is the process of splitting your data into training and test sets multiple times. Each time you do this, you choose a different training and test set. In this exercise, you'll perform a traditional ShuffleSplit cross-validation on the company value data from earlier. Later we'll cover what changes need to be made for time series data. The data we'll use is the same historical price data for several large companies.

An instance of the Linear regression object (model) is available in your workspace along with the function r2_score() for scoring. Also, the data is stored in arrays X and y. We've also provided a helper function (visualize_predictions()) to help visualize the results.

Cet exercice fait partie du cours

Machine Learning for Time Series Data in Python

Afficher le cours

Instructions

  • Initialize a ShuffleSplit cross-validation object with 10 splits.
  • Iterate through CV splits using this object. On each iteration:
    • Fit a model using the training indices.
    • Generate predictions using the test indices, score the model (\(R^2\)) using the predictions, and collect the results.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import ShuffleSplit and create the cross-validation object
from sklearn.model_selection import ShuffleSplit
cv = ____(____, random_state=1)

# Iterate through CV splits
results = []
for tr, tt in cv.____(X, y):
    # Fit the model on training data
    ____(X[tr], y[tr])
    
    # Generate predictions on the test data, score the predictions, and collect
    prediction = ____(X[tt])
    score = r2_score(____, ____)
    results.append((prediction, score, tt))

# Custom function to quickly visualize predictions
visualize_predictions(results)
Modifier et exécuter le code