Visualizing predictions

1. Visualizing predictions

In the last lesson we learned how to create posterior distributions for predictions of both observed and unobserved data. Earlier in this chapter we reviewed how to create visualizations for our observed data. In the final lesson of this course, we'll learn how to create visualizations for the predictions of new data. Thus, we'll be able to effectively communicate the output of our model in the context of out-of-sample data.

2. Plotting new predictions

Let's start with the same model we were working with in the last lesson. We're predicting a child's IQ score from their mother's IQ and whether or not their mother completed high school. We previously created a new data frame to predict the IQ scores of 2 kids whose mothers both had an IQ of 110, but one completed high school and the other did not. Using the `posterior_predict` function, we were able to get the posterior distributions for the prediction of each kid's IQ score. Now we want to visualize these distributions.

3. Formatting the data

The first step is to format the data so that we can plot it with ggplot2. To do this, we will first convert our posterior predictions to a data frame using the `as.data.frame` function. We then set the column names to be "No HS" and "Completed HS" so that we can identify which prediction is in each column. Finally, we use the `gather` function from the tidyr package to get the data into the structure needed for ggplot2. This leaves us with 2 columns: one indicating whether HS was completed, and one with the draws from the posterior distribution.

4. Creating the plot

To create the plot, we call the `ggplot` function, specify that we are using the plot_posterior data frame and that we want the `predict` column to be on the x-axis. We then use `facet_wrap` to put each level of high school completion in its own plot. Using `ncol = 1`, we make the plots be stacked vertically. Finally, we use `geom_density to create a density plot. With this visualization, we can see how the mother's completion of high school affects the predicted IQ score for the kids. The distribution for the kid with a mother who completed high school is shifted a little further to the right, indicating higher scores. However, for the most part, these distributions are very similar. This is also consistent with what we saw in the previous lesson when looking at the numerical summaries of the distributions. In those summaries, the average was slightly higher when the mother had completed high school. Thus these visualizations give us another tool to help communicate the predictions made by our model.

5. Let's practice

Now let's make some visualizations for some new predictions about song popularity using the Spotify data!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.