Get startedGet started for free

Logistics eCommerce model: k-means analysis

Now that you gained your first insight into the model outputs, you can deepen your understanding of patterns and relationships between results using cluster analysis.

You will use the k-means algorithm to help you understand the main controls of your model behavior and classify data points into groups with similar properties. This will help identify bottlenecks in the real-world e-commerce/logistics operation your model is representing.

kmeans and whiten have been imported from scipy.cluster.vq and matplotlib.pyplot as plt. The original and whitened datasets have the column data listed below. The dummy variable p defines the indexes of these processes in the datasets.

  • column 1 (p=0): time_requests
  • column 2 (p=1): time_packaging
  • column 3 (p=2): time_shipping
  • column 4 (p=3): sum/total time

This exercise is part of the course

Discrete Event Simulation in Python

View Course

Exercise instructions

  • Whiten record_processes_np array to prepare it for the k-means clustering.
  • Run the k-means method on whitened array using the SciPy package, setting the k-means method to find three clusters.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Whiten the record_processes_np array
whitened = ____(record_processes_np)

# Run the k-means method on whitened, using three clusters
codebook, distortion = ____(whitened, ____)

fig, axs = plt.subplots(3)
for p in range(3):
    axs[p].scatter(whitened[:, 3], whitened[:, p], marker=".", label=f"{process_names[p]}")
    axs[p].scatter(codebook[:, 3], codebook[:, p], label='Cluster Centroids')
    axs[p].legend(loc='center left', bbox_to_anchor=(1, 0.5))
    axs[p].set_ylabel(f'Process duration (days)')
    axs[p].set_xlabel('Total duration (days)')
plt.show()
Edit and Run Code