1. Pseudo-random number generation
You previously saw how to use a random sample to get results similar to those in the population. But how does a computer actually do this random sampling?
2. What does random mean?
There are several meanings of random in English. This definition from Oxford Languages is the most interesting for us. If we want to choose data points at random from a population, we shouldn't be able to predict which data points would be selected ahead of time in some systematic way.
3. True random numbers
To generate truly random numbers, we typically have to use a physical process like flipping coins or rolling dice.
The Hotbits service generates numbers from radioactive decay, and RANDOM-dot-ORG generates numbers from atmospheric noise, which are radio signals generated by lightning.
Unfortunately, these processes are fairly slow and expensive for generating random numbers.
4. Pseudo-random number generation
For most use cases, pseudo-random number generation is better since it is cheap and fast.
Pseudo-random means that although each value appears to be random, it is actually calculated from the previous random number.
Since you have to start the calculations somewhere, the first random number is calculated from what is known as a seed value.
The word random is in quotes to emphasize that this process isn't really random. If we start from a particular seed value, all future numbers will be the same.
5. Pseudo-random number generation example
For example, suppose we have a function to generate pseudo-random values called calc_next_random. To begin, we pick a seed number, in this case, one.
calc_next_random does some calculations and returns three.
We then feed three into calc_next_random, and it does the same set of calculations and returns two.
And if we can keep feeding in the last number, it will return something apparently random.
Although the process is deterministic, the trick to a random number generator is to make it look like the values are random.
6. Random number generating functions
NumPy has many functions for generating random numbers from statistical distributions. To use each of these, make sure to prepend each function name with numpy-dot-random or np-dot-random.
Some of them, like dot-uniform and dot-normal, may be familiar. Others have more niche applications.
7. Visualizing random numbers
Let's generate some pseudo-random numbers. The first arguments to each random number function specify distribution parameters. The size argument specifies how many numbers to generate, in this case, five thousand. We've chosen the beta distribution, and its parameters are named a and b.
These random numbers come from a continuous distribution, so a great way to visualize them is with a histogram.
Here, because the numbers were generated from the beta distribution, all the values are between zero and one.
8. Random numbers seeds
To set a random seed with NumPy, we use the dot-random-dot-seed method. random-dot-seed takes an integer for the seed number, which can be any number you like.
dot-normal generates pseudo-random numbers from the normal distribution. The loc and scale arguments set the mean and standard deviation of the distribution, and the size argument determines how many random numbers from that distribution will be returned.
If we call dot-normal a second time, we get two different random numbers.
If we reset the seed by calling random-dot-seed with the same seed number, then call dot-normal again, we get the same numbers as before. This makes our code reproducible.
9. Using a different seed
Now let's try a different seed.
This time, calling dot-normal generates different numbers.
10. Let's practice!
Let's sow some random seeds!