Unskew the variables
You will now transform the wholesale
columns using Box-Cox transformation, and then explore the pairwise relationships plot to make sure the skewness of the distributions has been reduced to make them more normal. This is a critical step to make sure the K-means algorithm converges and discovers homogeneous groups (a.k.a. clusters or segments) of observations.
The stats
module is loaded from the scipy
library, and the wholesale
dataset has been imported as a pandas
DataFrame.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Define a custom Box Cox transformation function that could be applied to a
pandas
DataFrame. - Apply the function to the
wholesale
dataset. - Plot the pairwise relationships between the transformed variables.
- Display the chart.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define custom Box Cox transformation function
def boxcox_df(x):
x_boxcox, _ = stats.___(x)
return x_boxcox
# Apply the function to the `wholesale` dataset
wholesale_boxcox = ___.___(boxcox_df, axis=0)
# Plot the pairwise relationships between the transformed variables
sns.___(___, diag_kind='kde')
# Display the chart
plt.___()