Logistics eCommerce model: k-means analysis
Now that you gained your first insight into the model outputs, you can deepen your understanding of patterns and relationships between results using cluster analysis.
You will use the k-means algorithm to help you understand the main controls of your model behavior and classify data points into groups with similar properties. This will help identify bottlenecks in the real-world e-commerce/logistics operation your model is representing.
kmeans
and whiten
have been imported from scipy.cluster.vq
and matplotlib.pyplot as plt
. The original and whitened datasets have the column data listed below. The dummy variable p
defines the indexes of these processes in the datasets.
- column 1 (
p=0
):time_requests
- column 2 (
p=1
):time_packaging
- column 3 (
p=2
):time_shipping
- column 4 (
p=3
):sum/total time
Este exercício faz parte do curso
Discrete Event Simulation in Python
Instruções do exercício
- Whiten
record_processes_np
array to prepare it for the k-means clustering. - Run the k-means method on
whitened
array using the SciPy package, setting the k-means method to find three clusters.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Whiten the record_processes_np array
whitened = ____(record_processes_np)
# Run the k-means method on whitened, using three clusters
codebook, distortion = ____(whitened, ____)
fig, axs = plt.subplots(3)
for p in range(3):
axs[p].scatter(whitened[:, 3], whitened[:, p], marker=".", label=f"{process_names[p]}")
axs[p].scatter(codebook[:, 3], codebook[:, p], label='Cluster Centroids')
axs[p].legend(loc='center left', bbox_to_anchor=(1, 0.5))
axs[p].set_ylabel(f'Process duration (days)')
axs[p].set_xlabel('Total duration (days)')
plt.show()