Get startedGet started for free

Plotting in More Dimensions

1. Plotting in More Dimensions

In the plots we have explored thus far, we have focused on visualizing data in two dimensions. However, there are instances where visualizing three dimensions in a single plot can provide valuable insights. In this video, we explore the benefits of plotting in more dimensions.

2. Why more dimensions?

Plotting data in higher dimensions offers advantages such as recognizing hidden patterns, analyzing multiple relationships at once, identifying clusters, creating clearer visualizations, and improving feature selection for machine learning models.

3. Will clusters persist?

Let's consider an example. We have a scatter plot showing the relationship between insurance premiums and age, revealing three distinct clusters in the data. Now, we want to examine if this pattern persists when we filter the data based on the number of children policyholders have. Let's first employ a simple approach.

4. Plotting a slice

Let us look at policyholders with no children. We can use the filter function to select rows where the number of children is zero, assigning the result to the no_children DataFrame. Next, we create a scatter plot of insurance charges versus BMI for this data slice. Notice that the same clustered structure is visible in the chart. However, if we wanted to test this for each number of children in the dataset, it would require creating six different plots since the number of children ranges from 0 to 5. Is there a more efficient approach?

5. Using another dimension

We can introduce an additional dimension to our plot by creating a three-dimensional scatter plot to examine multiple slices of the dataset at once. We can use the scatter function with the DataFrame recipe to achieve this. However, we pass three columns as the first three arguments, plotted on the x, y, and z-axes. The remaining code for creating the plot remains the same as for a two-dimensional scatter plot. Notably, the resulting plot shows the different slices simultaneously, each representing different numbers of children. Each slice exhibits a similar clustering structure, although there is not enough data for 4 and 5 children to observe this pattern clearly.

6. Axis order

Properly ordering the columns is crucial. Here, we interchange the age and number of children columns, resulting in a less visible cluster pattern.

7. Grouping by another category

Let's look at another example. Here, we have a scatter plot displaying the relationship between insurance premiums and body mass index, grouped by smoker status. Can we visualize the data points grouped by an additional categorical variable, such as sex?

8. Add a categorical dimension

We first call the scatter plotting function and provide the BMI as the first argument. Next, we include the sex categorical variable as the second argument. Then, we pass the insurance charges as the third argument, followed by customization keyword arguments and functions. We do not see any significant differences between male and female policyholders in the resulting plot.

9. Visualize point density

Returning to the two-dimensional scatter plot, while it visually represents point density for different BMI and insurance premiums values, it lacks a more quantitative density estimate. Is there a way to improve on this?

10. Two-dimensional histograms

A two-dimensional histogram offers an alternative approach to visualize point density by representing it as a third dimension with a color map. To create a two-dimensional histogram, we use the histogram2d plotting function. The first two arguments correspond to the numerical column names, specifically BMI and charges. To specify the color palette of the gradient, we can set the fillcolor argument to the desired color scheme. Lastly, by setting show_empty_bins to true, we can avoid having bins not filled with color. The resulting plot presents a color map where the number of points within each square bin determines its color. This allows for a quantitative measurement of point density.

11. Let's practice!

In this video, we learned how to add more dimensions to a visualization; time to expand your knowledge to new dimensions!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.