Get startedGet started for free

Machine learning for MPT

1. Machine learning for MPT

Now we'll learn how to apply machine learning to the features and targets we just created. Remember our targets are ideal portfolio weights, and features are exponentially-weighted moving averages of prices.

2. Make train and test sets

We'll again make train and test sets. We'll set the train size to 80% here. train_size is calculated with the number of rows in our features, from the shape. Then we use Python's indexing to get the first chunk of data up to train_size as the training set, and the data from train_size to the end as the test set.

3. Fit the model

Our next step is to fit a model to the training data. Random forests are usually good, so we'll go with that. For now, we won't alter hyperparameters except n_estimators, the number of trees. Usually, performance flattens out in the hundreds of trees, so I set it to 300. Then we fit our model to the training data by giving the fit() method features and targets. After that, we see how our model performed with the R-squared score.

4. Evaluate the model's performance

Now we'll evaluate the performance of the model in more detail. First, let's look at the monthly returns and see how it compares to investing in the NASDAQ ETF, QQQ. We'll look at the test set to check performance on unseen data. To get predictions, we take our model and use predict() on our features. Next, we calculate returns by multiplying monthly returns with test predictions. Notice we're using iloc indexing on our DataFrame -- this gives us the returns from train_size to the end of the data, which are the individually-weighted returns for each of the stocks. We then use numpy's sum() to add up rows into one number for each month. Notice we're supplying the argument axis=1 so numbers are added along the rows, not the columns. Finally, we can plot these returns on our test set, and see how it compares to investing in the QQQ index fund. A basic investing strategy is to buy large indices like the NASDAQ, which is QQQ, or the S&P500.

5. Model's monthly returns

The results are kind of a mixed bag -- some months are better than QQQ, some worse. We need a better way to see how it did overall.

6. Calculate hypothetical portfolio

We can see how our model is doing on the test set in a different way by calculating the returns on a hypothetical investment. Let's say we're starting with 1000 dollars. We can create a list containing our cash balance at each month, called algo_cash. The first entry is our starting amount. Then we loop through each of the months in our test set predictions. For each monthly return, we multiply cash by 1 + the return and set cash equal to this new value. The "asterisk equals" operator is a shortcut for this. Then we append the cash value to our list. We end up with a list, called algo_cash, of predicted returns for the test set months. Next, we do the same thing for QQQ, and create a list of true returns for the QQQ index. Finally, we can look at the overall returns which have been propagated through time. Our model isn't doing much better than QQQ, so it would be a good idea to go back and add some more features to our model.

7. Plot the results

Lastly, we can plot the results to see how it's doing over time. It looks like we had some big losses to start out, then rocketed above QQQ, ending up in the same place. It's a good start but we need better features!

8. Train your model!

Now it's time for you to train a model and see how it does!