Session Ready
Exercise

Correlations

Correlations are nice to check out before building machine learning models, because we can see which features correlate to the target most strongly. Pearson's correlation coefficient is often used, which only detects linear relationships. It's commonly assumed our data is normally distributed, which we can "eyeball" from histograms. Highly correlated variables have a Pearson correlation coefficient near 1 (positively correlated) or -1 (negatively correlated). A value near 0 means the two variables are not linearly correlated.

If we use the same time periods for previous price changes and future price changes, we can see if the stock price is mean-reverting (bounces around) or trend-following (goes up if it has been going up recently).

Instructions
100 XP

Using the lng_df DataFrame and its Adj_Close:

  • Create the 5-day future price (as 5d_future_close) with pandas' .shift(-5).
  • Use pct_change(5) on 5d_future_close and Adj_Close to create the future 5-day % price change (5d_close_future_pct), and the current 5-day % price change (5d_close_pct).
  • Examine correlations between the two 5-day percent price change columns with .corr() on lng_df.
  • Using plt.scatter(), make a scatterplot of 5d_close_pct vs 5d_close_future_pct.