Session Ready
Exercise

Random sampling

In this exercise, we're going to look at random sampling. You have been provided with a large dataset (athletes) containing the details of a large number of American athletes. For the purposes of this exercise, we are interested in differences between the body Weight of competitors in swimming and athletics. In order to test this, you'll be using a two-sample t-test. However, you will be performing this test on a random sample of the data. By playing with the random subset chosen, you'll see how randomness affects the results. You will need to extract a random subset of athletes from each event in order to run your test. pandas, scipy.stats, plotnine, and random have been loaded into the workspace as pd, stats, p9, and ran, respectively.

Instructions 1/2
undefined XP
  • 1
  • 2
  • Set seed to 0000.
  • Create two subset DataFrames (subsetathl and subsetswim) from athletes, with 30 random samples in each.
  • Perform a two-sample t-test on the Weight column of each subset DataFrame, save it to t_result, then print it.