Logistics eCommerce model: k-means analysis
Now that you gained your first insight into the model outputs, you can deepen your understanding of patterns and relationships between results using cluster analysis.
You will use the k-means algorithm to help you understand the main controls of your model behavior and classify data points into groups with similar properties. This will help identify bottlenecks in the real-world e-commerce/logistics operation your model is representing.
kmeans
and whiten
have been imported from scipy.cluster.vq
and matplotlib.pyplot as plt
. The original and whitened datasets have the column data listed below. The dummy variable p
defines the indexes of these processes in the datasets.
- column 1 (
p=0
):time_requests
- column 2 (
p=1
):time_packaging
- column 3 (
p=2
):time_shipping
- column 4 (
p=3
):sum/total time
This exercise is part of the course
Discrete Event Simulation in Python
Exercise instructions
- Whiten
record_processes_np
array to prepare it for the k-means clustering. - Run the k-means method on
whitened
array using the SciPy package, setting the k-means method to find three clusters.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Whiten the record_processes_np array
whitened = ____(record_processes_np)
# Run the k-means method on whitened, using three clusters
codebook, distortion = ____(whitened, ____)
fig, axs = plt.subplots(3)
for p in range(3):
axs[p].scatter(whitened[:, 3], whitened[:, p], marker=".", label=f"{process_names[p]}")
axs[p].scatter(codebook[:, 3], codebook[:, p], label='Cluster Centroids')
axs[p].legend(loc='center left', bbox_to_anchor=(1, 0.5))
axs[p].set_ylabel(f'Process duration (days)')
axs[p].set_xlabel('Total duration (days)')
plt.show()