1. EDA in Tableau: box plots
Let's see how you can create box plots in Tableau.
By default, Tableau likes to aggregate values, like the sum in this case. When working with individual data points, for example when drawing box plots, you need to disaggregate your data. There are two ways to do this. The first way is by dragging a unique identifier to the details mark, which will disaggregate the measures based on this unique ID field.
The second way is by going to the Analysis menu, and untick 'Aggregate Measures'.
Disaggregating will show all individual data points of your dataset and allow you to select the box-and-whisker plots view, for drawing boxplots. In this case, each data point represents the profit of each order.
Combining all manufacturers, we see lots of extreme values in both negative and positive profit figures. If we filter for one manufacturer separately, the distribution of profit for that particular manufacturer becomes apparent. One small note to make, is that Tableau creates the box plot according to Tukey's original description. Instead of using the lower and upper quartile as the borders of the box, the lower and upper hinges are used. Hinges are similar to quartiles, but depending on the number of observations, they are calculated slightly different. However, the difference is negligible.
The median is 9 point 1, meaning that half of the profits were below this number, and the other half above. The lower hinge is closer to the median than the upper hinge, indicating a slight right skewed asymmetry. Also note that the lower quartile of the box plot has a darker shade than the upper part. Tableau does this by default to show the difference between the two quartiles in the box. The lower and upper whisker are minus seven point eight and seventy one point two, respectively, meaning that all points below or above these values are considered outliers. In this example, we only have two of them, and since they are way out of the whiskers range, the distribution can be considered somewhat leptokurtic.
The real power of box plots becomes apparent when you want to compare multiple categories, or manufacturers in this case. For example, 3M has a much more limited spread of profits compared to Canon, but doesn't have as negative profits as Hon. Socket has just one order, so lower and upper whisker, lower and upper hinge, and median are consequently all the same, resulting in a single line.
One last useful trick is that you can sort the box plots. For example, you can sort by the median of each manufacturer, to quickly see where the lowest and highest median profit figures are.
Time for you to apply this in this last set of exercises of this chapter.
2. Let's practice!