Get startedGet started for free

The mean of means

You want to know what the average number of users (num_users) is per deal, but you want to know this number for the entire company so that you can see if Amir's deals have more or fewer users than the company's average deal. The problem is that over the past year, the company has worked on more than ten thousand deals, so it's not realistic to compile all the data. Instead, you'll estimate the mean by taking several random samples of deals, since this is much easier than collecting data from everyone in the company.

amir_deals is available and the user data for all the company's deals is available in all_deals. Both pandas as pd and numpy as np are loaded.

This exercise is part of the course

Introduction to Statistics in Python

View Course

Exercise instructions

  • Set the random seed to 321.
  • Take 30 samples (with replacement) of size 20 from all_deals['num_users'] and take the mean of each sample. Store the sample means in sample_means.
  • Print the mean of sample_means.
  • Print the mean of the num_users column of amir_deals.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Set seed to 321
____

sample_means = []
# Loop 30 times to take 30 means
for i in range(____):
  # Take sample of size 20 from num_users col of all_deals with replacement
  cur_sample = ____
  # Take mean of cur_sample
  cur_mean = ____
  # Append cur_mean to sample_means
  sample_means.append(____)

# Print mean of sample_means
print(____)

# Print mean of num_users in amir_deals
print(____)
Edit and Run Code