K-Means and Hierarchical clustering on Scaler's learner dataset to identify distinct learner segments based on salary, experience, and job profile.
Scaler wants to understand their learner base better. Different learners have different profiles — some are freshers trying to break into tech, others are experienced engineers upskilling for senior roles. Identifying these segments helps Scaler personalise course recommendations, mentorship pairing, and placement support.
from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler # Scale features scaler = StandardScaler() X_scaled = scaler.fit_transform(df[['ctc', 'experience']]) # Elbow method to find optimal k inertias = [] for k in range(1,11): km = KMeans(n_clusters=k, random_state=42) km.fit(X_scaled) inertias.append(km.inertia_) # Elbow at k=3 → 3 clusters optimal # Hierarchical: dendrogram shows 3 natural groups from scipy.cluster.hierarchy import dendrogram, linkage Z = linkage(X_scaled[:2000], method='ward') # Big vertical gap at top → 3 clusters confirmed