Correlations
Correlations are nice to check out before building machine learning models, because we can see which features correlate to the target most strongly. Pearson's correlation coefficient is often used, which only detects linear relationships. It's commonly assumed our data is normally distributed, which we can "eyeball" from histograms. Highly correlated variables have a Pearson correlation coefficient near 1 (positively correlated) or -1 (negatively correlated). A value near 0 means the two variables are not linearly correlated.
If we use the same time periods for previous price changes and future price changes, we can see if the stock price is mean-reverting (bounces around) or trend-following (goes up if it has been going up recently).
This exercise is part of the course
Machine Learning for Finance in Python
Exercise instructions
Using the lng_df
DataFrame and its Adj_Close
:
- Create the 5-day future price (as
5d_future_close
) with pandas'.shift(-5)
. - Use
pct_change(5)
on5d_future_close
andAdj_Close
to create the future 5-day % price change (5d_close_future_pct
), and the current 5-day % price change (5d_close_pct
). - Examine correlations between the two 5-day percent price change columns with
.corr()
onlng_df
. - Using
plt.scatter()
, make a scatterplot of5d_close_pct
vs5d_close_future_pct
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create 5-day % changes of Adj_Close for the current day, and 5 days in the future
lng_df['5d_future_close'] = lng_df['Adj_Close'].shift(____)
lng_df['5d_close_future_pct'] = lng_df['5d_future_close'].pct_change(5)
lng_df['5d_close_pct'] = lng_df['Adj_Close'].pct_change(____)
# Calculate the correlation matrix between the 5d close pecentage changes (current and future)
corr = lng_df[['5d_close_pct', '5d_close_future_pct']].____
print(corr)
# Scatter the current 5-day percent change vs the future 5-day percent change
plt.scatter(lng_df['5d_close_pct'], lng_df[____])
plt.show()