Discrete distributions

1. Discrete distributions

In this lesson, we'll take a deeper dive into probability and begin looking at probability distributions.

2. Rolling the dice

Let's consider rolling a standard, six-sided die.

3. Rolling the dice

There are six numbers, or six possible outcomes, and every number has one sixth, or about a 17 percent chance of being rolled. This is an example of a probability distribution.

4. Choosing salespeople

This is similar to the scenario from earlier, except we had names instead of numbers. Just like rolling a die, each outcome, or name, had an equal chance of being chosen.

5. Probability distribution

A probability distribution describes the probability of each possible outcome in a scenario. We can also talk about the expected value of a distribution, which is the mean of a distribution. We can calculate this by multiplying each value by its probability (one sixth in this case) and summing, so the expected value of rolling a fair die is 3-point-5.

6. Visualizing a probability distribution

We can visualize this using a barplot, where each bar represents an outcome, and each bar's height represents the probability of that outcome.

7. Probability = area

We can calculate probabilities of different outcomes by taking areas of the probability distribution. For example, what's the probability that our die roll is less than or equal to 2? To figure this out, we'll take the area of each bar representing an outcome of 2 or less.

8. Probability = area

Each bar has a width of 1 and a height of one sixth, so the area of each bar is one sixth. We'll sum the areas for 1 and 2, to get a total probability of one third.

9. Uneven die

Now let's say we have a die where the two got turned into a three. This means that we now have a 0% chance of getting a 2, and a 33% chance of getting a 3. To calculate the expected value of this die, we now multiply 2 by 0, since it's impossible to get a 2, and 3 by its new probability, one third. This gives us an expected value that's slightly higher than the fair die.

10. Visualizing uneven probabilities

When we visualize these new probabilities, the bars are no longer even.

11. Adding areas

With this die, what's the probability of getting something less than or equal to 2? There's a one sixth probability of getting 1, and zero probability of getting 2,

12. Adding areas

which sums to one sixth.

13. Discrete probability distributions

The probability distributions you've seen so far are both discrete probability distributions, since they represent situations with discrete outcomes. Recall from chapter 1 that discrete variables can be thought of as counted variables. In the case of a die, we're counting dots, so we can't roll a 1-point-5 or 4-point-3. When all outcomes have the same probability, like a fair die, this is a special distribution called a discrete uniform distribution.

14. Sampling from discrete distributions

Just like we sampled names from a box, we can do the same thing with probability distributions like the ones we've seen. Here's a DataFrame called die that represents a fair die, and its expected value is 3-point-5. We'll sample from it 10 times to simulate 10 rolls. Notice that we sample with replacement so that we're sampling from the same distribution every time.

15. Visualizing a sample

We can visualize the outcomes of the ten rolls using a histogram, defining the bins we want using np-dot-linspace.

16. Sample distribution vs. theoretical distribution

Notice that we have different numbers of 1's, 2's, 3's, and so on since the sample was random, even though on each roll we had the same probability of rolling each number. The mean of our sample is 3-point-0, which isn't super close to the 3-point-5 we were expecting.

17. A bigger sample

If we roll the die 100 times, the distribution of the rolls looks a bit more even, and the mean is closer to 3-point-5.

18. An even bigger sample

If we roll 1000 times, it looks even more like the theoretical probability distribution and the mean closely matches 3-point-5.

19. Law of large numbers

This is called the law of large numbers, which is the idea that as the size of your sample increases, the sample mean will approach the theoretical mean.

20. Let's practice!

Time to solidify your knowledge of probability distributions.

This exercise is part of the course

Introduction to Statistics in Python

IntermediateSkill Level

4.8+

Start Course for Free

Summary statistics gives you the tools you need to boil down massive datasets to reveal the highlights. In this chapter, you'll explore summary statistics including mean, median, and standard deviation, and learn how to accurately interpret them. You'll also develop your critical thinking skills, allowing you to choose the best summary statistics for your data.

Exercise 1: What is statistics?Exercise 2: Descriptive and inferential statistics Exercise 3: Data type classification Exercise 4: Measures of center Exercise 5: Mean and median Exercise 6: Mean vs. median Exercise 7: Measures of spread Exercise 8: Variance and standard deviation Exercise 9: Quartiles, quantiles, and quintiles Exercise 10: Finding outliers using IQR

In this chapter, you'll learn how to generate random samples and measure chance using probability. You'll work with real-world sales data to calculate the probability of a salesperson being successful. Finally, you’ll use the binomial distribution to model events with binary outcomes.

Exercise 1: What are the chances?Exercise 2: With or without replacement?Exercise 3: Calculating probabilities Exercise 4: Sampling deals Exercise 5: Discrete distributions

Current Exercise

Exercise 6: Creating a probability distribution Exercise 7: Identifying distributions Exercise 8: Expected value vs. sample mean Exercise 9: Continuous distributions Exercise 10: Which distribution?Exercise 11: Data back-ups Exercise 12: Simulating wait times Exercise 13: The binomial distribution Exercise 14: Simulating sales deals Exercise 15: Calculating binomial probabilities Exercise 16: How many sales will be won?

It’s time to explore one of the most important probability distributions in statistics, normal distribution. You’ll create histograms to plot normal distributions and gain an understanding of the central limit theorem, before expanding your knowledge of statistical functions by adding the Poisson, exponential, and t-distributions to your repertoire.

Exercise 1: The normal distribution Exercise 2: Distribution of Amir's sales Exercise 3: Probabilities from the normal distribution Exercise 4: Simulating sales under new market conditions Exercise 5: Which market is better?Exercise 6: The central limit theorem Exercise 7: Visualizing sampling distributions Exercise 8: The CLT in action Exercise 9: The mean of means Exercise 10: The Poisson distribution Exercise 11: Identifying lambda Exercise 12: Tracking lead responses Exercise 13: More probability distributions Exercise 14: Distribution dragging and dropping Exercise 15: Modeling time between leads Exercise 16: The t-distribution

In this chapter, you'll learn how to quantify the strength of a linear relationship between two variables, and explore how confounding variables can affect the relationship between two other variables. You'll also see how a study’s design can influence its results, change how the data should be analyzed, and potentially affect the reliability of your conclusions.

Exercise 1: Correlation Exercise 2: Guess the correlation Exercise 3: Relationships between variables Exercise 4: Correlation caveats Exercise 5: What can't correlation measure?Exercise 6: Transforming variables Exercise 7: Does sugar improve happiness?Exercise 8: Confounders Exercise 9: Design of experiments Exercise 10: Study types Exercise 11: Longitudinal vs. cross-sectional studies Exercise 12: Congratulations!