1. Data transforms, features, and targets
Now that we're familiar with our data, we need to get it prepared for machine learning.
2. Making features and targets
To prepare our data, we'll need features and targets. Our features are inputs we predict future price changes with -- the 10-day price change and volume. Our targets are the future price changes.
Generally, we can use pandas DataFrames and Series in our machine learning algorithms.
3. Moving averages
It's useful to incorporate historical data as features; for example, the price changes in the last 200 days. Instead of including all previous 200 days' price changes, we can concentrate past data into a single point using technical indicators like the moving average shown here.
4. Moving averages
A moving average is the average of a value in the past n days. Classic moving average periods are 14, 50, and 200 days for stocks.
5. Simple moving averages
Here's a plot showing the AMD price and a 200-day simple moving average, or SMA. You can see moving averages smooth the data.
6. Relative Strength Index
The other indicator we'll use is relative strength index, or RSI. This oscillates between 0 and 100. When it's close to 0, this may mean the price is due to rebound from recent lows. When RSI is close to 100, this may mean the price of the stock is due to decline.
7. RSI equation
The equation for RSI is 100 minus 100 over 1 + relative strength. Relative strength is the average gain of price increases divided by the average loss of price decreases during the time period, n. Both RSI and moving averages can be calculated with the TA-lib package. This Python library is a wrapper for C code, meaning we can run C code using Python.
8. Calculating SMA and RSI
To use TA-lib's functions for RSI and moving averages, we provide a numpy array of prices and the argument timeperiod. This is the value of n mentioned in the previous slides. We're using 200 for timeperiod, and adding the new features to our DataFrame as ma200 for the 200-day moving average, and rsi200 for the 200-day RSI.
For TALib functions, we must provide numpy arrays, not pandas objects. The dot-values property of pandas Series and DataFrames yields numpy arrays.
9. Finally, our features
We can now make our features and targets. We choose the 10-day close percent change and the 200-day moving average and RSI from our DataFrame. These feature names go in a list, which selects the columns from the DataFrame. Then we select the 10-day future close percent change as our target. Finally, we create a DataFrame with both the features and targets so we can check for correlations.
10. Check correlations
Before we do any machine learning, it's good to check features and targets for correlations. We use the pandas function corr() to calculate Pearson correlations, and the seaborn library has a handy heatmap function for plotting the correlations. The annot option shows the numeric values for each correlation in the plot.
11. Correlation plot
The plot looks like this, with colors ranging from black for negative correlations, to white for positive correlations. The numeric values are also shown in each square. To examine a correlation between two variables, we look for the intersection of the two variables in the plot. For example, the RSI and 10-day future price percent change intersect in the bottom left corner. These variables have little to no correlation since the value is close to 0. Usually, a value of 0-point-2 or greater means there is some linear correlation present.
The diagonal line with all ones shows the correlation of each variable with itself, which is 1. This means each variable is perfectly linearly correlated with itself, as we expect.
12. Let's create features and targets!
Ok, I think you're ready to create your own features and targets!