Week 9 | Session 3: K-means Clustering — Python Implementation (Google Colab)
Course: Supply Chain Digitization — Module 3: Analytics in SCM
Session Agenda
Section titled “Session Agenda”1. Context & Session Goal
Section titled “1. Context & Session Goal”- Previous Sessions: Concept — what K-means is, how the algorithm works, WCSS, Elbow Method.
- This Session: Implementation — write Python code in Google Colab, reproduce the cluster output.
- Dataset:
customer_location.csv— 811 rows, 3 columns (serial no., latitude, longitude).
2. Libraries Used
Section titled “2. Libraries Used”| Library / Alias | Purpose |
|---|---|
| pandas (pd) | Data manipulation and analysis library — read CSV, create DataFrames. |
| numpy (np) | Numerical Python — mathematical and logical operations on arrays. |
| matplotlib.pyplot (plt) | Plotting library for Python — line charts, scatter plots. |
| seaborn (sn) | High-level visualization library — attractive cluster plots with color coding. |
| sklearn.cluster.KMeans | K-means clustering implementation — fit clusters, get labels & centroids. |
3. Full Pipeline — 8 Steps
Section titled “3. Full Pipeline — 8 Steps”- Import Data: Load
customer_location.csvusing pandas. - Plot Raw Data: Visualize all 811 points on a lat/long scatter plot (seaborn).
- Select Features: Drop serial number column — keep only latitude & longitude.
- Find Optimal K: Loop K=1 to 9, compute WCSS, plot Elbow Diagram.
- Form Clusters: Run
KMeans(n_clusters=4).fit(). Assign IDs. - Plot Clusters: Color-coded scatter plot.
- Get Centroids: Extract
cluster_centers_(proposed DC locations). - Plot Centroids: Superimpose markers (‘x’) on the cluster plot.
4. Step-by-Step Code & Explanation
Section titled “4. Step-by-Step Code & Explanation”Step 1 — Import Data
Section titled “Step 1 — Import Data”import pandas as pddf = pd.read_csv('customer_location.csv')df.head()Step 2 — Plot Raw Data
Section titled “Step 2 — Plot Raw Data”import numpy as npimport matplotlib.pyplot as pltimport seaborn as sn
sn.lmplot(x='latitude', y='longitude', data=df, fit_reg=False, height=4)plt.title('Customer Locations')plt.show()Step 3 — Select Features
Section titled “Step 3 — Select Features”Remove irrelevant columns.
new_df = df[['latitude', 'longitude']]Step 4 — Find Optimal K (Elbow Diagram)
Section titled “Step 4 — Find Optimal K (Elbow Diagram)”from sklearn.cluster import KMeans
cluster_range = range(1, 10) # K = 1 to 9cluster_errors = [] # empty list to store WCSS values
for num_clusters in cluster_range: clusters = KMeans(num_clusters) clusters.fit(new_df) cluster_errors.append(clusters.inertia_) # inertia_ = WCSS
plt.figure(figsize=(6, 4))plt.plot(cluster_range, cluster_errors, marker='o')plt.title('Elbow Diagram')plt.xlabel('Number of Clusters')plt.ylabel('Sum of Squared Error')plt.show()Step 5 — Form Clusters (K = 4)
Section titled “Step 5 — Form Clusters (K = 4)”clusters_new = KMeans(4) # set K = 4clusters_new.fit(new_df)
# Add cluster ID as a new columnnew_df.insert(loc=2, column='cluster_id', value=clusters_new.labels_)Step 6 — Plot Clusters
Section titled “Step 6 — Plot Clusters”sn.lmplot(x='latitude', y='longitude', data=new_df, hue='cluster_id', fit_reg=False, height=4)plt.show()Step 7 — Extract Centroid Coordinates
Section titled “Step 7 — Extract Centroid Coordinates”centers = np.array(clusters_new.cluster_centers_)print(centers)# Output Example:# Cluster 0: [27.68, 80.90]Step 8 — Plot Centroids on Cluster Map
Section titled “Step 8 — Plot Centroids on Cluster Map”sn.lmplot(x='latitude', y='longitude', data=new_df, hue='cluster_id', fit_reg=False, height=4)
plt.scatter(centers[:, 0], centers[:, 1], marker='x', s=100, c='black')plt.show()5. Final Output — Centroid (DC) Locations
Section titled “5. Final Output — Centroid (DC) Locations”After running the full code, K-means returns 4 centroids — the proposed DC locations:
| Cluster ID | Centroid Lat | Centroid Long | Proposed DC serves… |
|---|---|---|---|
| 0 | 27.68 | 80.90 | Blue cluster customers |
| 1 | 27.42 | 81.15 | Orange cluster customers |
| 2 | 27.31 | 80.83 | Green cluster customers |
| 3 | 27.56 | 80.57 | Red cluster customers |
Session Summary
Section titled “Session Summary”- Pipeline: Import → Plot raw → Select features → Elbow diagram → Fit K=4 → Plot clusters → Get centroids → Plot centroids.
- Key output: 4 centroid coordinates = proposed DC locations; 811 cluster IDs = customer-DC mapping.
- Tool: Google Colab.