1. Visualizing baseball data
For this demo, we will create a table. On it, I’ll add yearAndTeam, number of errors, putouts, and assists, dragging the table down along the way.
In baseball, a putout is when a player on defense is directly responsible for an out--they catch the ball in the air, tag a runner, or step on a base. An assist happens when you throw the ball to another defender, who then records a putout. An error happens when a player either fails to make an expected out or makes a mistake which allows baserunners to advance or score.
Now I want to add a slicer on year range. I’ll add a slicer visual, put yearID into the Field section, and resize it appropriately. This helps us easily look at different eras of baseball.
Next, I want to check wild pitches versus passed balls by year to see if there is a correlation. A wild pitch is a pitch the catcher had no chance to catch, and as a result, baserunners advanced or scored. Meanwhile, a passed ball is one that the catcher should have been able to manage, and whose failure led to baserunners advancing or scoring. Let’s create a scatter chart, using wild pitches as the X Axis, passed balls as the Y Axis, and year for the Details.
It looks like there is a strong correlation between wild pitches and passed balls. This alludes to how difficult it can be for the official scorekeeper to differentiate between a poorly thrown pitch or insufficient effort from the catcher, and so we see both move in roughly the same direction.
Our final visual is a bubble chart. Here, I wish to see how well catchers throw out runners attempting to steal bases. A stolen base occurs when a runner successfully moves to another base without the ball being put into play by the batter, or via wild pitch or passed ball. If somebody tags out a runner attempting to steal a base, that’s called “caught stealing.” Historically, good base stealers have at least a 2:1 ratio of stolen bases versus times caught stealing.
On this plot, stolen bases will be our Y Axis, times caught stealing our X, and the year will fill Details. We also want a measure of how frequently the opportunity to steal a base might come up, as teams haven’t always played the same number of games every season. To do this, we can use outs recorded (called InnOuts in the dataset) as a proxy, and set that as the Size. In baseball, the defense must record 3 outs before an inning ends. This number tells us how long players are on defense, as games don’t always have the same number of innings and players may be substituted out of the game before its conclusion.
This is an instructive visual precisely because it’s so messy. We can get an idea of some correlation--though not as good as a 2:1 ratio--but it’s so hard to make out individual points because there are too many large bubbles on the chart.
Now it’s your turn to tell a story.
2. Let's practice!