1. Forecasting
Welcome to the final chapter!
2. Correlation vs. autocorrelation
In the previous chapter, you've learned about correlation: the relationship between two variables. For example, here is the Google Search data for the keyword DIY per month since 2013, with a third-degree polynomial trend line. The interest for the DIY keyword seemed to go up until 2017, and then gradually declined. We could calculate the R-squared and p-value of the model, but do you think it is a good fit?
As you probably noticed, there are some recurring patterns in the data, known as seasonality. In this case, the pattern repeats each year. There is a yearly peak of interest for DIY in October, when people are probably thinking about creative Halloween costumes, and there was a huge DIY interest at the beginning of the COVID pandemic in April 2020.
3. Correlation vs. autocorrelation
A trend line would never capture these patterns. We say that the repeating pattern correlates with itself, which is known as autocorrelation. When you measure a value repeatedly over time, in this case counting how much the keyword DIY is used per month, you have autocorrelated data. In general, such data is known as a time series, where time is discrete and equally-spaced. This requires another type of analysis, generally referred to as time series analysis.
4. Forecasting
Using historical data in a time series to make predictions about the future is known as forecasting. Similar to inference, where you make an estimation about the population based on a sample, a forecast estimates where future data points will likely fall, in the form of a confidence interval. The further your predictions are in the future, the wider your confidence interval will be.
Forecasting is used to make predictions in a range of fields, including supply chain management, earthquakes, hormone levels, market stocks, sports performance, and weather.
5. Naive forecast
Forecasting comes in different forms. A very basic form of forecasting is the so-called naive forecasting method. The forecast is simply the last observed value. As the name suggests, this approach is oversimplified but very cost-effective, making it ideal to benchmark against more complex models.
6. Exponential smoothing
Forecasting in Tableau uses a technique known as exponential smoothing. Its formula shows that predictions will be influenced more by recent values than the past. It uses a constant alpha that can be changed to optimize the forecasts, and smooths out drastic value changes. Actually, Tableau uses up to eight exponential smoothing models and displays the best one.
7. Mean absolute error (MAE)
That begs the question: what constitutes a good forecast? Different metrics exist to assess the quality of forecast predictions, one of the simplest being the mean absolute error (MAE). You take the absolute differences between the actual and forecast values, and take the mean. The MAE is calculated automatically in Tableau.
8. Mean absolute scaled error (MASE)
The MAE is used to calculate the MASE, the mean absolute scaled error. It compares the MAE of your model, with the MAE of the naive forecast. A MASE close to zero means your model is really accurate in predicting the future. The closer it gets to one, the more it is similar to the naive forecast. It can even be greater than one, meaning that it is worse than the naive forecast.
A MASE of 0 point 65 means that your model has 65 percent error of the naive model. That is acceptable, and it would probably be lower if the peak of interest didn't happen in April 2020. Like us, forecasting doesn't like unforeseen changes.
Tableau has many options to optimize forecasting, but the out-of-the-box model will be acceptable by default in the majority of cases.
9. Let's practice!
Before looking at some forecasts in Tableau, let's recap the concepts behind forecasting.