Unskew the variables
You will now transform the wholesale columns using Box-Cox transformation, and then explore the pairwise relationships plot to make sure the skewness of the distributions has been reduced to make them more normal. This is a critical step to make sure the K-means algorithm converges and discovers homogeneous groups (a.k.a. clusters or segments) of observations.
The stats module is loaded from the scipy library, and the wholesale dataset has been imported as a pandas DataFrame.
Diese Übung ist Teil des Kurses
Machine Learning for Marketing in Python
Anleitung zur Übung
- Define a custom Box Cox transformation function that could be applied to a
pandasDataFrame. - Apply the function to the
wholesaledataset. - Plot the pairwise relationships between the transformed variables.
- Display the chart.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Define custom Box Cox transformation function
def boxcox_df(x):
x_boxcox, _ = stats.___(x)
return x_boxcox
# Apply the function to the `wholesale` dataset
wholesale_boxcox = ___.___(boxcox_df, axis=0)
# Plot the pairwise relationships between the transformed variables
sns.___(___, diag_kind='kde')
# Display the chart
plt.___()