Unskew the variables
You will now transform the wholesale
columns using Box-Cox transformation, and then explore the pairwise relationships plot to make sure the skewness of the distributions has been reduced to make them more normal. This is a critical step to make sure the K-means algorithm converges and discovers homogeneous groups (a.k.a. clusters or segments) of observations.
The stats
module is loaded from the scipy
library, and the wholesale
dataset has been imported as a pandas
DataFrame.
Este exercício faz parte do curso
Machine Learning for Marketing in Python
Instruções do exercício
- Define a custom Box Cox transformation function that could be applied to a
pandas
DataFrame. - Apply the function to the
wholesale
dataset. - Plot the pairwise relationships between the transformed variables.
- Display the chart.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Define custom Box Cox transformation function
def boxcox_df(x):
x_boxcox, _ = stats.___(x)
return x_boxcox
# Apply the function to the `wholesale` dataset
wholesale_boxcox = ___.___(boxcox_df, axis=0)
# Plot the pairwise relationships between the transformed variables
sns.___(___, diag_kind='kde')
# Display the chart
plt.___()