Get startedGet started for free

Explore the data with some EDA

First, let's explore the data. Any time we begin a machine learning (ML) project, we need to first do some exploratory data analysis (EDA) to familiarize ourselves with the data. This includes things like:

  • raw data plots
  • histograms
  • and more…

I typically begin with raw data plots and histograms. This allows us to understand our data's distributions. If it's a normal distribution, we can use things like parametric statistics.

There are two stocks loaded for you into pandas DataFrames: lng_df and spy_df (LNG and SPY). Take a look at them with .head(). We'll use the closing prices and eventually volume as inputs to ML algorithms.

Note: We'll call plt.clf() each time we want to make a new plot, or f = plt.figure().

This exercise is part of the course

Machine Learning for Finance in Python

View Course

Exercise instructions

  • Print out the first 5 lines of the two DataFrame (lng_df and spy_df) and examine their contents.
  • Use the pandas library to plot raw time series data for 'SPY' and 'LNG' with the adjusted close price ('Adj_Close') -- set legend=True in .plot().
  • Use plt.show() to show the raw time series plot (matplotlib.pyplot has been imported as plt).
  • Use pandas and matplotlib to make a histogram of the adjusted close 1-day percent difference (use .pct_change()) for SPY and LNG.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

print(lng_df.head())  # examine the DataFrames
print(____)  # examine the SPY DataFrame

# Plot the Adj_Close columns for SPY and LNG
spy_df[____].plot(label='SPY', legend=True)
lng_df[____].plot(label=____, ____, secondary_y=True)
____  # show the plot
plt.clf()  # clear the plot space

# Histogram of the daily price change percent of Adj_Close for LNG
lng_df['Adj_Close'].____.plot.hist(bins=50)
plt.xlabel('adjusted close 1-day percent change')
plt.show()
Edit and Run Code