Create train and test features

Before we fit our linear model, we want to add a constant to our features, so we have an intercept for our linear model.

We also want to create train and test features. This is so we can fit our model to the train dataset, and evaluate performance on the test dataset. We always want to check performance on data the model has not seen to make sure we're not overfitting, which is memorizing patterns in the training data too exactly.

With a time series like this, we typically want to use the oldest data as our training set, and the newest data as our test set. This is so we can evaluate the performance of the model on the most recent data, which will more realistically simulate predictions on data we haven't seen yet.

Import the statsmodels.api library with the alias sm.
Add a constant to the features variable using statsmodels' .add_constant() function.
Set train_size as 85% of the total number of datapoints (number of rows) using the .shape[0] property of features or targets.
Break up linear_features and targets into train and test sets using train_size and Python indexing (e.g. [start:stop]).

Preparing data and a linear model

Machine learning tree methods

Neural networks and KNN

Machine learning with modern portfolio theory

Exercise

Create train and test features

Instructions