Exercise 2. Distribution of errors - 1
Assume the proportion of Democrats in the population \(p\) equals 0.45 and that your sample size \(N\) is 100 polled voters. The take_sample
function you defined previously generates our estimate, \(\bar{X}\).
Replicate the random sampling 10,000 times and calculate \(p - \bar{X}\) for each random sample. Save these differences as a vector called errors
. Find the average of errors
and plot a histogram of the distribution.
This exercise is part of the course
HarvardX Data Science Module 4 - Inference and Modeling
Exercise instructions
- The function
take_sample
that you defined in the previous exercise has already been run for you. - Use the
replicate
function to replicate subtracting the result oftake_sample
from the value of \(p\) 10,000 times. - Use the
mean
function to calculate the average of the differences between the sample average and actual value of \(p\).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define `p` as the proportion of Democrats in the population being polled
p <- 0.45
# Define `N` as the number of people polled
N <- 100
# The variable `B` specifies the number of times we want the sample to be replicated
B <- 10000
# Use the `set.seed` function to make sure your answer matches the expected result after random sampling
set.seed(1)
# Create an objected called `errors` that replicates subtracting the result of the `take_sample` function from `p` for `B` replications
# Calculate the mean of the errors. Print this value to the console.