Get startedGet started for free

Boxes and violins

1. Boxes and violins

Welcome to this video on box and violin plots! Box and violin plots are commonly used graphical representations that provide insights into the distribution, central tendency, and variability of a dataset, making them practical tools for data visualization and comparative analysis.

2. Histograms with many categories

At the start of this chapter, we employed a grouped histogram to compare two distributions. However, the grouped histogram becomes visually cluttered and less effective when dealing with more categories.

3. Box plots

Box plots are preferred when comparing multiple categories, as they offer a concise and clear summary of each category's distributional characteristics. Unlike grouped histograms, box plots avoid overlapping and cluttering issues, enabling easy visual comparisons. Although box plots provide a less detailed view of the distributions, they excel at highlighting essential aspects. For instance, the median representing the central tendency is depicted as a horizontal line across the box. The box size indicates the spread, while skewness, a measure of the asymmetry of a distribution, can be inferred from the positions of the whiskers and the median. These qualities make box plots highly effective tools for comparative analysis, allowing for an insightful examination of central tendency, spread, and skewness among different categories.

4. Boxes of product

To create a box plot in StatsPlots-dot-jl, we begin by importing the package. Then, we utilize the boxplot function, providing the categories for the x-axis as the first argument and the corresponding values for the y-axis as the second argument. We make our plot more appealing by customizing the color, hiding the legend, and adding a label to the y-axis. See how the outliers in the price of onions we encountered earlier in the chapter are clearly shown.

5. Hiding outliers

Suppose our primary focus is comparing the central distributions for each category while disregarding the outliers. In that case, we can exclude them from the box plot by setting the outliers argument to false, removing them from the visualization. Observe that when the outliers are excluded, the distributions of these different food products appear remarkably similar, which aligns with our previous findings in this chapter.

6. Violin plots

An alternative approach is to utilize a violin plot for a more detailed visualization and comparison of the distribution of prices within each category. This type of plot provides a comprehensive representation of the distribution for each food item, resembling a density plot specific to each item. The violin plot shown here presents the same data previously displayed in the box plot.

7. Distributions with violin plot

To create a violin plot, we utilize the violin function, providing the categories as the first argument and the corresponding values as the second argument. The linewidth argument can be set to zero to remove the lines surrounding the violins and achieve a cleaner appearance. This approach generates a plot that enables a detailed comparison of each distribution.

8. Boxes or violins?

The decision to use either box plots or violin plots depends on the objectives and characteristics of the data analysis. Box plots are suitable for comparing central tendency, spread, and skewness, emphasizing outliers, and concisely summarizing distributional characteristics. On the other hand, violin plots are ideal for examining the detailed distribution within each category, visualizing shape, and analyzing data with multimodal distributions, i.e., a distribution with more than one pronounced peak. They are also easier to interpret than box plots. Therefore the decision depends on the desired level of detail and the specific insights to be conveyed.

9. Let's practice!

Box plots and violin plots are sophisticated visualization techniques that showcase our growing expertise in data visualization. Congratulations! Let's put these exciting skills to practice in the upcoming exercises and further enhance your abilities in exploring and presenting data.