EDA: Plot all your data
To get a graphical overview of a dataset, it is often useful to plot all of your data. In this exercise, plot all of the splits for all female swimmers in the 800 meter heats. The data are available in a NumPy arrays split_number
and splits
. The arrays are organized such that splits[i,j]
is the split time for swimmer i
for split_number[j]
.
This exercise is part of the course
Case Studies in Statistical Thinking
Exercise instructions
- Write a
for
loop, looping over the set of splits for each swimmer to:- Plot the split time versus split number. Use the
linewidth=1
andcolor='lightgray'
keyword arguments.
- Plot the split time versus split number. Use the
- Compute the mean split times for each distance. You can do this using the
np.mean()
function with theaxis=0
keyword argument. This tellsnp.mean()
to compute the means over rows, which will give the mean split time for each split number. - Plot the mean split times (y-axis) versus split number (x-axis) using the
marker='.'
,linewidth=3
, andmarkersize=12
keyword arguments. - Label the axes and show the plot.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Plot the splits for each swimmer
for splitset in ____:
_ = ____(____, ____, lw=1, color='lightgray')
# Compute the mean split times
mean_splits = ____
# Plot the mean split times
# Label axes and show plot
_ = plt.xlabel('split number')
_ = plt.ylabel('split time (s)')
plt.show()