Get startedGet started for free

Exercise 2. Distribution of errors - 1

Assume the proportion of Democrats in the population \(p\) equals 0.45 and that your sample size \(N\) is 100 polled voters. The take_sample function you defined previously generates our estimate, \(\bar{X}\).

Replicate the random sampling 10,000 times and calculate \(p - \bar{X}\) for each random sample. Save these differences as a vector called errors. Find the average of errors and plot a histogram of the distribution.

This exercise is part of the course

HarvardX Data Science Module 4 - Inference and Modeling

View Course

Exercise instructions

  • The function take_sample that you defined in the previous exercise has already been run for you.
  • Use the replicate function to replicate subtracting the result of take_sample from the value of \(p\) 10,000 times.
  • Use the mean function to calculate the average of the differences between the sample average and actual value of \(p\).

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define `p` as the proportion of Democrats in the population being polled
p <- 0.45

# Define `N` as the number of people polled
N <- 100

# The variable `B` specifies the number of times we want the sample to be replicated
B <- 10000

# Use the `set.seed` function to make sure your answer matches the expected result after random sampling
set.seed(1)

# Create an objected called `errors` that replicates subtracting the result of the `take_sample` function from `p` for `B` replications


# Calculate the mean of the errors. Print this value to the console.

Edit and Run Code