1. Introduction to swimming data
Now that you have reviewed some of the concepts and techniques from Statistical Thinking in Python Parts I and II, you are ready to dive into the first case study.
2. The 2015 FINA World Championships
You will get your feet wet with swimming races from the 2015 FINA World Championships in Kazan, Russia, which took place in this pool.
Before we plunge into the data, I need to bring you up to speed with some background on swimming competitions.
First let's look at the pool. If you count, you will notice that there are ten lanes in the pool. Like Python arrays, the lanes are zero-indexed, going from zero to nine. Races are typically swum in lanes one through eight. The pool is 50 meters long, so in events longer than 50 meters, the swimmers turn around at the ends of the pool.
3. Strokes at the World Championships
There are four different strokes, or ways of swimming, and each have their own competitions. They are freestyle, breaststroke, butterfly, and backstroke. It is important to note that because of the differences in the mechanics of the strokes, they are swum at different speeds.
4. Events at the World Championships
You can think of each event at the World Championships as being defined by unique a tuple of a gender, distance, and stroke. For example, in your first exercises, you will look at results of the men's 200 m freestyle.
5. Rounds of events
Each event is swum in stages, called rounds, with the fastest swimmers from each round progressing in the competition. The first round is referred to as heats. In events of longer distance, the fastest eight swimmers from the heats advance to compete in the finals, the winner of which is the champion. Shorter distance events have a semifinal round, where the fastest 16 swimmers from the heats race in the semifinals, and the fastest eight of those advance to the final.
6. Data source
Finally, I want to mention that the data we will use in this case study are freely available from the Swiss company OMEGA, the official timekeepers of the World Championships.
Ok, that bit of background should cover what you need to know for the first batch of exercises. I will give you more background as needed as you work through the datasets.
7. Domain-specific knowledge is
Setting up this background leads me to an important point. The data science skills you are learning are very widely applicable, and when you apply them to specific problems, you almost always need to inform yourself about the particular field of study. In the present example, it is important to know about the different strokes that are swum at different speeds. If you were doing a larger scale project with swimming, say for a national team or a swimsuit manufacturer, you would want to work closely with experts like coaches, kinesiologists, and swimmers themselves.
This part of data science is both critically important and exhilarating. Throughout a career as a data scientist, you can learn about many different fascinating fields as you apply your craft to them.
8. Let's practice!
Ok, now it is time to splash around in some data from the men's 200 m freestyle at the 2015 World Championships. And I promise that the exercises are not full of bad puns like this video is!