Get startedGet started for free

The temperature in a Normal lake

1. The temperature in a Normal lake

2. The model we've used so far

So this is the model we’ve used so far, a binomial model with a uniform prior. Useful if you want to learn about an underlying proportion of success and want to predict the number of successes in future data. But what if we have some other type of data and want to predict something else?

3. A Normal lake

Say that I’m thinking about throwing together a beach party on the 20th of July, but I live in Sweden, and I’m uncertain whether the water in my local lake will be warm enough. I do have

4. Some temperature data

some data though, the maximum water temperature on the 20th of July for the last 5 years. So this would be in degrees Celsius, it’s not that cold in Sweden, which corresponds to the following in Fahrenheit, but here we’ll stick to using Celsius. So what I want to know now is, what is the average water temperature on the 20th of July for this lake, and how certain should I be that the temperature will be ok for my beach party, that is, over 18 degrees Celsius. Since this is a Bayesian course, we’re going to use Bayes to figure this out. Now we need to come up with a reasonable generative model for this data, the binomial model won’t work here. We need a model that can generate continuous data, that could both be negative and positive, but that generates data centered around a mid-point, a mean value. A common choice here is

5. The Normal distribution

the normal distribution. It has two parameters, two unknowns we can tweak, the mean, where the distribution is centered, often denoted by the Greek letter mu, looking like a strange u, and the standard deviation of the distribution, how far away from the mean values tend to fall, often denoted by the Greek letter sigma. To use the normal distribution here, we need to know how to

6. The Normal distribution in R

work with it in R. In R you can draw values from a normal distribution using the rnorm function. Where n is the number of values you want to draw. So, for example, if we want to simulate temperatures that are sort of like the actual temperature data, we could plug in the following.

7. The Normal distribution in R

We want to generate five data points, with a mean around 20 and a standard deviation somewhere around 2. I’m not calculating these numbers, just making a quick guess. If we rerun this now, a couple of times we get data that looks similar to the real data. So, for now, let’s assume the normal distribution is a reasonable generative model. Just as with the binomial model there is also a d-function, here

8. The Normal distribution in R

dnorm, which directly calculates how likely a data point is given some fixed parameters. Let’s calculate the likelihood of our data given a mean of 20 and a standard deviation of 2. Like the uniform distribution, the normal distribution is continuous, so what we get out are not probabilities, but can be viewed as the relative likelihoods of our data points. We see that, relative to the other measured temperatures, 20 degrees, the third value, is the most likely, which makes sense as we fixed the mean to be 20 degrees. We can also calculate the likelihood of all the data points together. Just as with probabilities, if we want to calculate the likelihood of this and that, we multiply this and that. To multiply together all numbers in a vector we can use the prod product function. As we multiplied many small numbers together, this becomes a super tiny number, and the more data points, the tinier it will become, and likelihoods can often become so tiny that your computer can’t reliably represent the numbers. This is a reason why, in many cases, one works with likelihoods on the log scale, which make small numbers less tiny, and avoids numerical instabilities. This is why, in stats, you’ll then often encounter, so-called, log-likelihoods and functions for calculating log-likelihoods. But in this course, we’ll stick to using non-logged likelihoods. So at this point, you would be ready to start using the normal distribution in a Bayesian model.

9. Try out using rnorm and dnorm!

But before you do that, let’s try using rnorm and dnorm in a couple of exercises.