1. Time Series Decomposition for Outlier Detection
When analyzing time series data, we typically look for three components: seasonality, trend and noise.
2. Seasonality
Seasonality represents whether there are repeating patterns in the series that correspond to a repeating unit of time. Seasonality does not necessarily mean the time series repeats patterns via changing of the seasons.
A time series can have hourly, daily, weekly, monthly, yearly or any arbitrary frequency.
For example, temperature has daily, while the sales of certain types of food like ice-cream have yearly seasonality.
3. Seasonality
It is hard to detect seasonality through visual inspection alone.
4. seasonal_decompose
To see the core time series components more clearly, we will use statsmodels. We import a function called seasonal_decompose from statsmodels-dot-tsa-dot-seasonal and pass in the Open column of the stocks dataset.
We also set the period argument to 365 because we have yearly measurements. The result is a DecomposeResult object.
5. Plotting seasonality
To plot the seasonality, we use the dot-seasonal attribute and then call plot. We also make the figure wider with the figsize argument.
Stocks datasets don't usually have seasonality patterns as their prices fluctuate almost randomly. However, we can see that there is a clear seasonality pattern in the opening prices of Google stocks.
6. Seasonality examples
Here are some more examples of seasonality in time series.
In each one, you can see a clear pattern from period to period.
7. Initial plot of stocks
Going back to the stocks, we saw that the opening prices of Google stocks grew over the given period despite the fluctuations.
8. Trend
If the trend is not as clear from a visual inspection like Google stocks, we can plot the trend component of the DecomposeResult object, which reflects the trend more accurately.
The plot confirms that the opening prices of Google stocks increased significantly over the given period.
9. Trend examples
Here are more examples of time series trends. Feature four and five have a clear downwards and upwards slopes, while the rest go up and down over the years.
10. Residuals
The last component of a time series is residuals. Residuals are fluctuations, noise, outliers or values not explained by either seasonality or trend.
We can plot them in the same way using the dot-resid attribute.
11. Decomposition
This way of breaking time series into components is called time series decomposition. It is an important part of time series analysis as it can reveal insights about the data not seen to the naked eye. To see all three components simultaneously, we call plot on the DecomposeResult object.
Don't forget to set a proper figure width and height.
12. Fitting a classifier
Decomposition is also important in another respect. By fitting a univariate classifier to the residuals, we can find the outliers in the time series. Let's try that with Median Absolute Deviation on the Volume column.
We first extract the residuals object via the resid attribute and retrieve the underlying numpy array of values. Then, we call reshape on it to convert it to a 2D array. The last steps are the same as always.
We find 81 outliers, which is close to the number we found using IForest in the last video.
13. Let's practice!
Let's practice!