Get startedGet started for free

Create train and test features

Before we fit our linear model, we want to add a constant to our features, so we have an intercept for our linear model.

We also want to create train and test features. This is so we can fit our model to the train dataset, and evaluate performance on the test dataset. We always want to check performance on data the model has not seen to make sure we're not overfitting, which is memorizing patterns in the training data too exactly.

With a time series like this, we typically want to use the oldest data as our training set, and the newest data as our test set. This is so we can evaluate the performance of the model on the most recent data, which will more realistically simulate predictions on data we haven't seen yet.

This exercise is part of the course

Machine Learning for Finance in Python

View Course

Exercise instructions

  • Import the statsmodels.api library with the alias sm.
  • Add a constant to the features variable using statsmodels' .add_constant() function.
  • Set train_size as 85% of the total number of datapoints (number of rows) using the .shape[0] property of features or targets.
  • Break up linear_features and targets into train and test sets using train_size and Python indexing (e.g. [start:stop]).

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the statsmodels.api library with the alias sm
___

# Add a constant to the features
linear_features = sm.____(features)

# Create a size for the training set that is 85% of the total number of samples
train_size = int(0.85 * ____)
train_features = linear_features[:train_size]
train_targets = targets[____]
test_features = linear_features[train_size:]
test_targets = targets[train_size:]
print(linear_features.shape, train_features.shape, test_features.shape)
Edit and Run Code