1. Interpreting results and comparing models
Welcome back! Let's learn to analyze and compare model results.
2. Running the model revisited
Here is the model from before. Using pm-dot-sample, we specified 1000 valid posterior draws and before them, 500 burn-in draws were sampled and discarded.
3. Running the model revisited
We didn't set the chains parameter, which defaults to the number of cores of one's machine. Mine has 4, so chains was set to 4. The number of chains is the number of independent repetitions of the sampling process. It's generally recommended to have more than 1 chain in case some of them don't converge.
We have 4 parameters in the model: the intercept, two impact parameters, and the standard deviation. For each of them, we have 1000 draws per chain, and so we get 4000 posterior draws in total.
4. Trace plot
We can visualize our sampled draws by passing the trace to the pm-dot-traceplot function. It produces two subplots per parameter. Let's zoom in at one of them.
5. Trace plot: zoom in on one parameter
On the left, we have the posterior density plot. There are four lines on the plot, one for each chain. The fact that they are all similar indicates they are truly sampled from the right posterior. On the right, we have a line plot of all 4000 draws, separately for each chain. They oscillate closely around some constant average, which proves good convergence.
6. Forest plot
We can also pass the trace to the forestplot function, which you already know. This time, it will show a separate line for each chain of each parameter. We see that except from the intercept, we are pretty confident about the other parameters' values.
7. Trace summary
To calculate some summary statistics of the posterior draws, we can call pm-dot-summary with the trace object passed to it. It produces a table with a lot of valuable insights.
In the first two columns, we can see the mean and the standard deviation of the draws for each parameter.
Next, we have the ends of the 97% credible interval. It seems that the impact of clothes and sneakers is very similar!
Finally, take a look at the last column labelled r_hat. This number is only computed if we run more than one chain. Values of r_hat greater than one indicate that some chains have not converged. Here, we have ones top to bottom, so everything's fine.
8. Fitting another model
Now, consider another model, let's call it model_2. It's similar to the one before. The only difference is that we add one more explanatory variable: weekend, with a 1 denoting a weekend day, and zero otherwise.
9. Widely Applicable Information Criterion (WAIC)
We can compare two models based on the Widely Applicable Information Criterion, or WAIC. To do so, we gather the model traces in one dictionary and pass it to pm-dot compare, setting ic to "waic" and scale to "deviance". We can then print the resulting comparison table.
WAIC is a measure of model fit. The lower it is, the better the model. Here, model_2 which uses the weekend variable is slightly better than model_1.
Since the Bayesian approach is all about probability, we can also calculate the probability of each model being the true model. It is shown in the weight column and suggests that model_2 is slightly better.
10. Compare plot
We can also plot the model comparison with pm-dot-compareplot. The empty circles show the WAIC values for each model and the black error bars associated with them show their standard deviations.
For all but the top model, we also get a gray triangle indicating the difference in WAIC between that model and the top one. The standard deviation error bars show we are not that confident in the superiority of model_2.
11. Let's practice comparing models!
Let's practice comparing models!