LoslegenKostenlos loslegen

Create train and test features

Before we fit our linear model, we want to add a constant to our features, so we have an intercept for our linear model.

We also want to create train and test features. This is so we can fit our model to the train dataset, and evaluate performance on the test dataset. We always want to check performance on data the model has not seen to make sure we're not overfitting, which is memorizing patterns in the training data too exactly.

With a time series like this, we typically want to use the oldest data as our training set, and the newest data as our test set. This is so we can evaluate the performance of the model on the most recent data, which will more realistically simulate predictions on data we haven't seen yet.

Diese Übung ist Teil des Kurses

Machine Learning for Finance in Python

Kurs anzeigen

Anleitung zur Übung

  • Import the statsmodels.api library with the alias sm.
  • Add a constant to the features variable using statsmodels' .add_constant() function.
  • Set train_size as 85% of the total number of datapoints (number of rows) using the .shape[0] property of features or targets.
  • Break up linear_features and targets into train and test sets using train_size and Python indexing (e.g. [start:stop]).

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Import the statsmodels.api library with the alias sm
___

# Add a constant to the features
linear_features = sm.____(features)

# Create a size for the training set that is 85% of the total number of samples
train_size = int(0.85 * ____)
train_features = linear_features[:train_size]
train_targets = targets[____]
test_features = linear_features[train_size:]
test_targets = targets[train_size:]
print(linear_features.shape, train_features.shape, test_features.shape)
Code bearbeiten und ausführen