Sampling from the population
In this lab we have access to the entire population, but this is rarely the case in real life. Gathering information on an entire population is often extremely costly or impossible. Because of this, we often take a sample of the population and use that to understand the properties of the population.
If we were interested in estimating the mean living area in Ames based on a sample, we can use the sample()
function to sample from the population: sample(area, 50)
.
This command collects a simple random sample of size 50 from the vector area
. This is like going into the City Assessor's database and pulling up the files on 50 random home sales. If we didn't have access to the population data, working with these 50 files would be considerably simpler than having to go through all 2930 home sales.
This exercise is part of the course
Data Analysis and Statistical Inference
Exercise instructions
- Take a sample of 50 of the
area
and assign it tosamp0
. - Take another sample of 50 and assign it to
samp1
. - Compare these two samples by drawing histograms. You can switch between graphs by clicking the arrows right above the graph.
- Think about how their distribution compares to the complete population.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The ames data frame and area and price objects are already loaded into the workspace
# Create the samples:
samp0 <-
samp1 <-
# Draw the histograms: