Explore the data with some EDA

First, let's explore the data. Any time we begin a machine learning (ML) project, we need to first do some exploratory data analysis (EDA) to familiarize ourselves with the data. This includes things like:

raw data plots
histograms
and more…

I typically begin with raw data plots and histograms. This allows us to understand our data's distributions. If it's a normal distribution, we can use things like parametric statistics.

There are two stocks loaded for you into pandas DataFrames: lng_df and spy_df (LNG and SPY). Take a look at them with .head(). We'll use the closing prices and eventually volume as inputs to ML algorithms.

Note: We'll call plt.clf() each time we want to make a new plot, or f = plt.figure().

Print out the first 5 lines of the two DataFrame (lng_df and spy_df) and examine their contents.
Use the pandas library to plot raw time series data for 'SPY' and 'LNG' with the adjusted close price ('Adj_Close') -- set legend=True in .plot().
Use plt.show() to show the raw time series plot (matplotlib.pyplot has been imported as plt).
Use pandas and matplotlib to make a histogram of the adjusted close 1-day percent difference (use .pct_change()) for SPY and LNG.

Preparing data and a linear model

Machine learning tree methods

Neural networks and KNN

Machine learning with modern portfolio theory

Exercise

Explore the data with some EDA

Instructions