Pseudo-random number generation
1. Pseudo-random number generation
You previously saw how to use a random sample to get results similar to those in the population. But how does a computer actually do this random sampling?2. What does random mean?
There are several meanings of random in English. This definition from Oxford Languages is the most interesting for us. If we want to choose data points at random from a population, we shouldn't be able to predict which data points would be selected ahead of time in some systematic way.3. True random numbers
To generate truly random numbers, you typically have to use a physical process like flipping coins or rolling dice. The Hotbits service generates numbers from radioactive decay, and RANDOM-dot-ORG generates numbers from atmospheric noise, which is radio signals generated by lightning. The latter service can be accessed in R via the random package. Unfortunately, these processes are fairly slow and expensive for generating random numbers.4. Pseudo-random number generation
For most use cases, pseudo-random number generation is better since it is cheap and fast. Pseudo-random means that although each value appears to be random, it is actually calculated from the previous random number. Since you have to start the calculations somewhere, the first random number is calculated from what is known as a seed value. For example, suppose we have a function to generate random values called calc_next_random. To begin, we pick a seed number, in this case one. calc_next_random does some calculations and returns three. We then feed three into calc_next_random and it does the same set of calculations and returns two. And we can keep feeding in the last number and it will return something apparently random. I put the word random in quotes to emphasize that this process isn't really random. If you start from a particular seed value, all future numbers generated by calling calc_next_random will be the same. Although the process is deterministic, the trick to a random number generator is to make it look like the values are random.5. Random number generating functions
As well as sample, R has many functions for generating random numbers from statistical distributions. The ones in the table are available in base-R, and there are many more in other packages. Some of them, like runif and rnorm should be familiar to you. Others have more niche applications.6. Visualizing random numbers
Let's generate some pseudo-random numbers. The first argument to each random number function specifies how many numbers to generate, in this case five thousand. Subsequent arguments specify distribution parameters. I've chosen the beta distribution, and its parameters are boringly named shape1 and shape2. To make the numbers easier to plot, you can put them inside a data frame. These random numbers come from a continuous distribution so a great way to visualize them is with a histogram. Here, because the numbers were generated from the beta distribution, all the values are between zero and one.7. Random numbers seeds
To set a random seed in R, you use set-dot-seed. This is automatically called when R starts up, so you don't often explicitly need to call it, but more on that in a moment. set-dot-seed takes an integer for the seed number. It can be any number you like. rnorm generates pseudo-random numbers from the normal distribution. The first argument determines how many random numbers should be returned. If we call rnorm a second time, we get five different random numbers. If we reset the seed by calling set-dot-seed again then call rnorm again, we get the same numbers as before. This makes our code reproducible.8. Using a different seed
Now let's try a different seed. This time, calling rnorm generates different numbers.9. Let's practice!
Let's sow some random seeds!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.