Comparing sampling methods
1. Comparing sampling methods
Let's review the various sampling techniques you learned about.2. Review of sampling techniques
For convenience, we'll stick to the six top counted countries that we used before. Simple random sampling just uses slice_sample to randomly pick rows. Stratified sampling groups by the country subgroups before performing simple random sampling on each. Cluster sampling uses simple random sampling to determine which country subgroups to include, then performs simple random sampling on each. In the cluster sample I've used two out of six countries to roughly mimic prop equals one third from the other sample types, and set n equals one sixth of the total number of rows to give roughly equal sample sizes in each subgroup. The n argument in slice_sample() needs to be an integer, so we may need to employ the floor() function if division causes a decimal.3. Calculating mean cup points
Let's calculate a population parameter, the mean of the cup points. When the population parameter is the mean of a field, it's often called the population mean. Remember that in real-life scenarios, we typically wouldn't know what the population mean is. Since we have it here though, we can use this value of eighty-one point nine as a gold standard to measure against. Now we'll calculate the same value using each of the sampling techniques. These are point estimates of the mean, often called sample means. The simple and stratified sample means are really close to the population mean. Cluster sampling isn't quite as close, but that's typical. Cluster sampling is designed to give you an answer that's almost as good, while using less data.4. Mean cup points by country: simple random
Here's a slightly more complicated calculation, calculating the mean for each country. We group by country before summarizing to give six numbers. So how do the numbers from the simple random sample compare? Each of the sample means is pretty close to the population mean.5. Mean cup points by country: stratified
The same is true of the sample means from the stratified technique. Each sample mean is pretty close to the population mean.6. Mean cup points by country: cluster
With cluster sampling, while the sample means are pretty close to the population mean, the obvious limitation is that you only get values for the two countries that were included in the sample. If the mean cup points for each country is an important metric in your analysis, cluster sampling would be a bad idea.7. Let's practice!
Let's calculate some summary statistics.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.