1. Learn
  2. /
  3. Courses
  4. /
  5. Machine Learning for Finance in Python

Exercise

Explore the data with some EDA

First, let's explore the data. Any time we begin a machine learning (ML) project, we need to first do some exploratory data analysis (EDA) to familiarize ourselves with the data. This includes things like:

  • raw data plots
  • histograms
  • and more…

I typically begin with raw data plots and histograms. This allows us to understand our data's distributions. If it's a normal distribution, we can use things like parametric statistics.

There are two stocks loaded for you into pandas DataFrames: lng_df and spy_df (LNG and SPY). Take a look at them with .head(). We'll use the closing prices and eventually volume as inputs to ML algorithms.

Note: We'll call plt.clf() each time we want to make a new plot, or f = plt.figure().

Instructions

100 XP
  • Print out the first 5 lines of the two DataFrame (lng_df and spy_df) and examine their contents.
  • Use the pandas library to plot raw time series data for 'SPY' and 'LNG' with the adjusted close price ('Adj_Close') -- set legend=True in .plot().
  • Use plt.show() to show the raw time series plot (matplotlib.pyplot has been imported as plt).
  • Use pandas and matplotlib to make a histogram of the adjusted close 1-day percent difference (use .pct_change()) for SPY and LNG.