Get startedGet started for free

Unskew the variables

You will now transform the wholesale columns using Box-Cox transformation, and then explore the pairwise relationships plot to make sure the skewness of the distributions has been reduced to make them more normal. This is a critical step to make sure the K-means algorithm converges and discovers homogeneous groups (a.k.a. clusters or segments) of observations.

The stats module is loaded from the scipy library, and the wholesale dataset has been imported as a pandas DataFrame.

This exercise is part of the course

Machine Learning for Marketing in Python

View Course

Exercise instructions

  • Define a custom Box Cox transformation function that could be applied to a pandas DataFrame.
  • Apply the function to the wholesale dataset.
  • Plot the pairwise relationships between the transformed variables.
  • Display the chart.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define custom Box Cox transformation function
def boxcox_df(x):
    x_boxcox, _ = stats.___(x)
    return x_boxcox

# Apply the function to the `wholesale` dataset
wholesale_boxcox = ___.___(boxcox_df, axis=0)

# Plot the pairwise relationships between the transformed variables 
sns.___(___, diag_kind='kde')

# Display the chart
plt.___()
Edit and Run Code