Get startedGet started for free

Correlations

Correlations are nice to check out before building machine learning models, because we can see which features correlate to the target most strongly. Pearson's correlation coefficient is often used, which only detects linear relationships. It's commonly assumed our data is normally distributed, which we can "eyeball" from histograms. Highly correlated variables have a Pearson correlation coefficient near 1 (positively correlated) or -1 (negatively correlated). A value near 0 means the two variables are not linearly correlated.

If we use the same time periods for previous price changes and future price changes, we can see if the stock price is mean-reverting (bounces around) or trend-following (goes up if it has been going up recently).

This exercise is part of the course

Machine Learning for Finance in Python

View Course

Exercise instructions

Using the lng_df DataFrame and its Adj_Close:

  • Create the 5-day future price (as 5d_future_close) with pandas' .shift(-5).
  • Use pct_change(5) on 5d_future_close and Adj_Close to create the future 5-day % price change (5d_close_future_pct), and the current 5-day % price change (5d_close_pct).
  • Examine correlations between the two 5-day percent price change columns with .corr() on lng_df.
  • Using plt.scatter(), make a scatterplot of 5d_close_pct vs 5d_close_future_pct.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create 5-day % changes of Adj_Close for the current day, and 5 days in the future
lng_df['5d_future_close'] = lng_df['Adj_Close'].shift(____)
lng_df['5d_close_future_pct'] = lng_df['5d_future_close'].pct_change(5)
lng_df['5d_close_pct'] = lng_df['Adj_Close'].pct_change(____)

# Calculate the correlation matrix between the 5d close pecentage changes (current and future)
corr = lng_df[['5d_close_pct', '5d_close_future_pct']].____
print(corr)

# Scatter the current 5-day percent change vs the future 5-day percent change
plt.scatter(lng_df['5d_close_pct'], lng_df[____])
plt.show()
Edit and Run Code