EDA: Plot all your data
To get a graphical overview of a dataset, it is often useful to plot all of your data. In this exercise, plot all of the splits for all female swimmers in the 800 meter heats. The data are available in a NumPy arrays split_number and splits. The arrays are organized such that splits[i,j] is the split time for swimmer i for split_number[j].
This exercise is part of the course
Case Studies in Statistical Thinking
Exercise instructions
- Write a
forloop, looping over the set of splits for each swimmer to:- Plot the split time versus split number. Use the
linewidth=1andcolor='lightgray'keyword arguments.
- Plot the split time versus split number. Use the
- Compute the mean split times for each distance. You can do this using the
np.mean()function with theaxis=0keyword argument. This tellsnp.mean()to compute the means over rows, which will give the mean split time for each split number. - Plot the mean split times (y-axis) versus split number (x-axis) using the
marker='.',linewidth=3, andmarkersize=12keyword arguments. - Label the axes and show the plot.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Plot the splits for each swimmer
for splitset in ____:
_ = ____(____, ____, lw=1, color='lightgray')
# Compute the mean split times
mean_splits = ____
# Plot the mean split times
# Label axes and show plot
_ = plt.xlabel('split number')
_ = plt.ylabel('split time (s)')
plt.show()