Sampling from the population

In this lab we have access to the entire population, but this is rarely the case in real life. Gathering information on an entire population is often extremely costly or impossible. Because of this, we often take a sample of the population and use that to understand the properties of the population.

If we were interested in estimating the mean living area in Ames based on a sample, we can use the sample() function to sample from the population: sample(area, 50).

This command collects a simple random sample of size 50 from the vector area. This is like going into the City Assessor's database and pulling up the files on 50 random home sales. If we didn't have access to the population data, working with these 50 files would be considerably simpler than having to go through all 2930 home sales.

This exercise is part of the course

Data Analysis and Statistical Inference

View Course

Exercise instructions

  • Take a sample of 50 of the area and assign it to samp0.
  • Take another sample of 50 and assign it to samp1.
  • Compare these two samples by drawing histograms. You can switch between graphs by clicking the arrows right above the graph.
  • Think about how their distribution compares to the complete population.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# The ames data frame and area and price objects are already loaded into the workspace

# Create the samples:
samp0 <-
samp1 <-

# Draw the histograms: