Get startedGet started for free

Random number generators and hacker statistics

1. Random number generators and hacker statistics

In practice, we are going to think probabilistically using hacker statistics. You had an introduction to hacker statistics in previous DataCamp courses, and in this course we will greatly extend your expertise.

2. Hacker statistics

The basic idea is that instead of literally repeating the data acquisition over and over again, we can simulate those repeated measurements using Python. For our first simulation, we will take a cue from our forebears. The concepts of probability originated from studies of games of chance

3. Blaise Pascal

by Pascal and others in the 17th century, so we will simulate

4. Coins

coin flips. Specifically, we will simulate the outcome of 4 successive coin flips. Our goal is to compute the probability that we will get four heads out of four flips.

5. Simulating coin flips

As we will see in just a moment, we can use Numpy to draw a number between zero and one such that all numbers in this interval are equally likely to occur. We can use this to simulate a coin flip. If the number we draw is less than 0-point-5, which has a 50% chance of happening, we say we got heads, and we get tails otherwise. This type of experiment, where the result is either True (heads) or False (tails) is referred to as

6. Bernoulli trials

a Bernoulli trial, and we will work with these more as we go through the course. Ok. Now let's work on implementing the simulation.

7. The np.random module

Numpy's random module, a suite of functions based on pseudorandom number generation, will be your main engine for generating, or drawing, random numbers. To use it, we first need to instantiate a random number generator, or RNG for short, using the np-dot-random-dot-default_rng function. This gives a Generator object that has methods for drawing random numbers.

8. Random number seed

The RNG works by starting with an integer, called a seed, and then generates random numbers in succession. The same seed gives the same sequence of random numbers, hence the name, "pseudorandom number generation". So, if you want to have reproducible random numbers, say for debugging purposes, you can seed the random number generator by inputting the seed as an argument when you instantiate the RNG. For this course, we will always seed the RNG so that your results match mine.

9. Simulating 4 coin flips

Reinstantiating the RNG with a seed, we can then use the random method of the RNG to generate random numbers. Conveniently, we can specify how many random numbers we want with the size keyword argument. The second number we get is less than one half, so it is a heads, but the remaining three are tails. We can show that explicitly using the less than operation, which gives us an array with the Boolean value True for heads and False for tails. We can simulate the number of heads by summing the array of Booleans because in numerical contexts, Python treats True as one and False as zero.

10. Simulating 4 coin flips

We first initialize the count to zero. We then do 10,000 repeats of the four-flip trials. If a given trial had four heads, we increase the count. So, what is the probability of getting all four heads? It's the number of times we got all heads, divided by the total number of trials we did. The result is about 0-point-06. Pascal and his friends did not have computers and worked out problems like these with pen and paper. While this particular problem is tractable, pen-and-paper statistics can get hard fast.

11. Hacker stats probabilities

With hacker statistics, you pretty much do this same procedure every time. Figure out how to simulate your data, simulate it many many times, and then compute the fraction of trials that had the outcome you're interested in.

12. Let's practice!

Now, let's use hacker statistics to simulate some more data!