1. Let's play ball!
Welcome! My name is Kevin Feasel and I will be guiding you through this course.
2. The dataset
We will use Lahman's Baseball Database, a collection of statistics for baseball games going back to the year 1871. This is the result of decades of work and is actively maintained while also widely available.
3. A primer on baseball
Baseball is played on a diamond, with home plate (marked as the number 4 in the diagram) and three bases 90 feet apart. Players run the bases counterclockwise.
Games generally last nine innings, where both teams bat until they commit three outs. There are several ways to make an out, and making an out loses part of your most precious resource in the game.
There are nine players on the field for the defense, including the pitcher (number 5) and catcher (number 2). Play starts with the pitcher throwing to the catcher.
Meanwhile, one person from the other team (marked as 3) may bat at a time, and there may be zero or one runners on each of the three bases.
The goal of the batter is to get on base via hit, walk, or other technique. Then, the batter scores by touching home plate safely after touching all three bases in order.
4. Batting measures and statistics
Baseball has a rich history of statistics and we will look at a few here.
An at-bat occurs when a player either gets a hit or makes an out. A hit happens when the batter makes contact with the ball and safely makes it to one of the bases.
Dividing number of hits by at-bats, we get batting average. Higher is better, but in baseball, a batting average of .300 or higher is great.
Those aren't the only possible outcomes, though. Walks and hits by pitch are good because they let the runner take first base, and a sacrifice fly means the batter committed an out which in turn allowed a runner to score, so it's pretty good as well.
This leads to our second measure: on-base percentage. On-base percentage takes all of our good outcomes and divides them by all of our possible outcomes. An on-base percentage of .350 or higher is usually good and .400 is great.
5. Our audience
Our audience in this lesson will be a baseball historian who already understands the sport and does not need us to explain it.
However, the historian does wish to investigate whether common opinions are accurate or not.
This is all done independently, as our historian does not work for a franchise.
6. Let's practice!
Now that we've learned a little bit about the game of baseball and our intended audience, let's dive into the sport!