Exercise 3 - Sampling From the Normal Distribution
In a previous section, we repeatedly took random samples of 50 heights from a distribution of heights. We noticed that about 95% of the samples had confidence intervals spanning the true population mean.
Re-do this Monte Carlo simulation, but now instead of \(N=50\), use \(N=15\). Notice what happens to the proportion of hits.
This exercise is part of the course
HarvardX Data Science Module 4 - Inference and Modeling
Exercise instructions
- Use the
replicate
function to carry out the simulation. Specify the number of times you want the code to run and, within brackets, the three lines of code that should run. - First use the
sample
function to randomly sampleN
values fromx
. - Second, create a vector called
interval
that calculates the 95% confidence interval for the sample. You will use theqnorm
function. - Third, use the
between
function to determine if the population meanmu
is contained between the confidence intervals. - Save the results of the Monte Carlo function to a vector called
res
. - Use the
mean
function to determine the proportion of hits inres
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the neccessary libraries and data
library(dslabs)
library(dplyr)
data(heights)
# Use the sample code to generate 'x', a vector of male heights
x <- heights %>% filter(sex == "Male") %>%
.$height
# Create variables for the mean height 'mu', the sample size 'N', and the number of times the simulation should run 'B'
mu <- mean(x)
N <- 15
B <- 10000
# Use the `set.seed` function to make sure your answer matches the expected result after random sampling
set.seed(1)
# Generate a logical vector 'res' that contains the results of the simulations
# Calculate the proportion of times the simulation produced values within the 95% confidence interval. Print this value to the console.