The mean of means

You want to know what the average number of users (num_users) is per deal, but you want to know this number for the entire company so that you can see if Amir's deals have more or fewer users than the company's average deal. The problem is that over the past year, the company has worked on more than ten thousand deals, so it's not realistic to compile all the data. Instead, you'll estimate the mean by taking several random samples of deals, since this is much easier than collecting data from everyone in the company.

amir_deals is available and the user data for all the company's deals is available in all_deals. Both pandas as pd and numpy as np are loaded.

This exercise is part of the course

Introduction to Statistics in Python

Exercise instructions

Set the random seed to 321.
Take 30 samples (with replacement) of size 20 from all_deals['num_users'] and take the mean of each sample. Store the sample means in sample_means.
Print the mean of sample_means.
Print the mean of the num_users column of amir_deals.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Set seed to 321
____

sample_means = []
# Loop 30 times to take 30 means
for i in range(____):
  # Take sample of size 20 from num_users col of all_deals with replacement
  cur_sample = ____
  # Take mean of cur_sample
  cur_mean = ____
  # Append cur_mean to sample_means
  sample_means.append(____)

# Print mean of sample_means
print(____)

# Print mean of num_users in amir_deals
print(____)

Edit and Run Code

This exercise is part of the course

Introduction to Statistics in Python

IntermediateSkill Level

4.8+

Start Course for Free

Summary statistics gives you the tools you need to boil down massive datasets to reveal the highlights. In this chapter, you'll explore summary statistics including mean, median, and standard deviation, and learn how to accurately interpret them. You'll also develop your critical thinking skills, allowing you to choose the best summary statistics for your data.

Exercise 1: What is statistics?Exercise 2: Descriptive and inferential statistics Exercise 3: Data type classification Exercise 4: Measures of center Exercise 5: Mean and median Exercise 6: Mean vs. median Exercise 7: Measures of spread Exercise 8: Variance and standard deviation Exercise 9: Quartiles, quantiles, and quintiles Exercise 10: Finding outliers using IQR

In this chapter, you'll learn how to generate random samples and measure chance using probability. You'll work with real-world sales data to calculate the probability of a salesperson being successful. Finally, you’ll use the binomial distribution to model events with binary outcomes.

Exercise 1: What are the chances?Exercise 2: With or without replacement?Exercise 3: Calculating probabilities Exercise 4: Sampling deals Exercise 5: Discrete distributions Exercise 6: Creating a probability distribution Exercise 7: Identifying distributions Exercise 8: Expected value vs. sample mean Exercise 9: Continuous distributions Exercise 10: Which distribution?Exercise 11: Data back-ups Exercise 12: Simulating wait times Exercise 13: The binomial distribution Exercise 14: Simulating sales deals Exercise 15: Calculating binomial probabilities Exercise 16: How many sales will be won?

It’s time to explore one of the most important probability distributions in statistics, normal distribution. You’ll create histograms to plot normal distributions and gain an understanding of the central limit theorem, before expanding your knowledge of statistical functions by adding the Poisson, exponential, and t-distributions to your repertoire.

Exercise 1: The normal distribution Exercise 2: Distribution of Amir's sales Exercise 3: Probabilities from the normal distribution Exercise 4: Simulating sales under new market conditions Exercise 5: Which market is better?Exercise 6: The central limit theorem Exercise 7: Visualizing sampling distributions Exercise 8: The CLT in action Exercise 9: The mean of means

Current Exercise

Exercise 10: The Poisson distribution Exercise 11: Identifying lambda Exercise 12: Tracking lead responses Exercise 13: More probability distributions Exercise 14: Distribution dragging and dropping Exercise 15: Modeling time between leads Exercise 16: The t-distribution

In this chapter, you'll learn how to quantify the strength of a linear relationship between two variables, and explore how confounding variables can affect the relationship between two other variables. You'll also see how a study’s design can influence its results, change how the data should be analyzed, and potentially affect the reliability of your conclusions.

Exercise 1: Correlation Exercise 2: Guess the correlation Exercise 3: Relationships between variables Exercise 4: Correlation caveats Exercise 5: What can't correlation measure?Exercise 6: Transforming variables Exercise 7: Does sugar improve happiness?Exercise 8: Confounders Exercise 9: Design of experiments Exercise 10: Study types Exercise 11: Longitudinal vs. cross-sectional studies Exercise 12: Congratulations!