Get startedGet started for free

Exercise 3 - Sampling From the Normal Distribution

In a previous section, we repeatedly took random samples of 50 heights from a distribution of heights. We noticed that about 95% of the samples had confidence intervals spanning the true population mean.

Re-do this Monte Carlo simulation, but now instead of \(N=50\), use \(N=15\). Notice what happens to the proportion of hits.

This exercise is part of the course

HarvardX Data Science Module 4 - Inference and Modeling

View Course

Exercise instructions

  • Use the replicate function to carry out the simulation. Specify the number of times you want the code to run and, within brackets, the three lines of code that should run.
  • First use the sample function to randomly sample N values from x.
  • Second, create a vector called interval that calculates the 95% confidence interval for the sample. You will use the qnorm function.
  • Third, use the between function to determine if the population mean mu is contained between the confidence intervals.
  • Save the results of the Monte Carlo function to a vector called res.
  • Use the mean function to determine the proportion of hits in res.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load the neccessary libraries and data
library(dslabs)
library(dplyr)
data(heights)

# Use the sample code to generate 'x', a vector of male heights
x <- heights %>% filter(sex == "Male") %>%
  .$height

# Create variables for the mean height 'mu', the sample size 'N', and the number of times the simulation should run 'B'
mu <- mean(x)
N <- 15
B <- 10000

# Use the `set.seed` function to make sure your answer matches the expected result after random sampling
set.seed(1)

# Generate a logical vector 'res' that contains the results of the simulations





# Calculate the proportion of times the simulation produced values within the 95% confidence interval. Print this value to the console.
Edit and Run Code