Get startedGet started for free

EDA: Plot all your data

To get a graphical overview of a dataset, it is often useful to plot all of your data. In this exercise, plot all of the splits for all female swimmers in the 800 meter heats. The data are available in a NumPy arrays split_number and splits. The arrays are organized such that splits[i,j] is the split time for swimmer i for split_number[j].

This exercise is part of the course

Case Studies in Statistical Thinking

View Course

Exercise instructions

  • Write a for loop, looping over the set of splits for each swimmer to:
    • Plot the split time versus split number. Use the linewidth=1 and color='lightgray' keyword arguments.
  • Compute the mean split times for each distance. You can do this using the np.mean() function with the axis=0 keyword argument. This tells np.mean() to compute the means over rows, which will give the mean split time for each split number.
  • Plot the mean split times (y-axis) versus split number (x-axis) using the marker='.', linewidth=3, and markersize=12 keyword arguments.
  • Label the axes and show the plot.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Plot the splits for each swimmer
for splitset in ____:
    _ = ____(____, ____, lw=1, color='lightgray')

# Compute the mean split times
mean_splits = ____

# Plot the mean split times


# Label axes and show plot
_ = plt.xlabel('split number')
_ = plt.ylabel('split time (s)')
plt.show()
Edit and Run Code