K-Means Clustering Calculator

Interactive K-means clustering with real-time visualization. See cluster assignments, centroids, the elbow method, silhouette scores, and generated Python sklearn code.

Sample Data:

Data Input

Data Points (one per line, comma-separated features)

Number of Clusters (k): 3

258

Results

6.71

Inertia (WCSS)

0.8837

Silhouette Score

Iterations

Data Points

Clusters

Cluster 1(6 pts)

[1.27, 2.13]

Cluster 2(6 pts)

[5.62, 8.23]

Cluster 3(6 pts)

[9.62, 1.62]

Cluster Visualization

Elbow Method

Python Code

import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Data
X = np.array([
    [1, 2],
    [1.5, 1.8],
    [1.2, 2.5],
    [5, 8],
    [6, 9],
    [5.5, 7.5],
    [9, 1],
    [10, 2],
    [9.5, 1.5],
    [1.8, 2.2],
    [5.2, 8.3],
    [9.2, 1.8],
    [1.3, 1.5],
    [5.8, 7.8],
    [10.2, 1.2],
    [0.8, 2.8],
    [6.2, 8.8],
    [9.8, 2.2]
])

# Optional: Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# K-Means Clustering (k=3)
kmeans = KMeans(n_clusters=3, init='k-means++', n_init=10, random_state=42)
labels = kmeans.fit_predict(X_scaled)

print(f"Centroids (scaled):\n{kmeans.cluster_centers_}")
print(f"Inertia: {kmeans.inertia_:.4f}")
print(f"Iterations: {kmeans.n_iter_}")

# Silhouette Score
sil = silhouette_score(X_scaled, labels)
print(f"Silhouette Score: {sil:.4f}")

# Cluster sizes
for i in range(3):
    print(f"Cluster {i}: {sum(labels == i)} points")

# Elbow Method
inertias = []
K_range = range(1, 9)
for k in K_range:
    km = KMeans(n_clusters=k, init='k-means++', n_init=10, random_state=42)
    km.fit(X_scaled)
    inertias.append(km.inertia_)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Elbow plot
axes[0].plot(K_range, inertias, 'bo-')
axes[0].set_xlabel('Number of Clusters (k)')
axes[0].set_ylabel('Inertia')
axes[0].set_title('Elbow Method')
axes[0].grid(True, alpha=0.3)

# Cluster visualization (first 2 dimensions)
colors = plt.cm.Set1(labels / max(labels.max(), 1))
axes[1].scatter(X_scaled[:, 0], X_scaled[:, 1], c=colors, s=50, alpha=0.7)
axes[1].scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
               c='black', marker='X', s=200, edgecolors='white', linewidths=2)
axes[1].set_xlabel('Feature 1')
axes[1].set_ylabel('Feature 2')
axes[1].set_title(f'K-Means Clusters (k=3)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Frequently Asked Questions

How do I choose the number of clusters (k)?

Use the Elbow Method: plot inertia vs k and look for an "elbow" where adding more clusters gives diminishing returns. Also use the Silhouette Score - values closer to 1 indicate better-defined clusters. Domain knowledge is equally important for choosing k.

What is inertia in K-means?

Inertia (also called WCSS - Within-Cluster Sum of Squares) measures the sum of squared distances between each data point and its assigned cluster centroid. Lower inertia means tighter clusters, but it always decreases with more clusters, which is why the elbow method is needed.

What is K-means++ initialization?

K-means++ is a smart initialization that spreads initial centroids apart. Instead of random placement, it selects the first centroid randomly, then subsequent centroids are chosen with probability proportional to their squared distance from the nearest existing centroid. This leads to better convergence.

When does K-means fail?

K-means struggles with: non-spherical clusters (elongated or irregular shapes), clusters of very different sizes, clusters of different densities, and high-dimensional data (curse of dimensionality). Consider DBSCAN, Gaussian Mixture Models, or spectral clustering for such cases.

Should I standardize features before K-means?

Yes! K-means uses Euclidean distance, so features with larger scales dominate the distance calculation. Standardizing (zero mean, unit variance) ensures all features contribute equally. Use StandardScaler from sklearn.

Related Tools

KNN Classifier

K-nearest neighbors classification

Decision Tree

Visualize classification trees

Statistics Calculator

Mean, median, std deviation

import numpy as np from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt # Data X = np.array([ [1, 2], [1.5, 1.8], [1.2, 2.5], [5, 8], [6, 9], [5.5, 7.5], [9, 1], [10, 2], [9.5, 1.5], [1.8, 2.2], [5.2, 8.3], [9.2, 1.8], [1.3, 1.5], [5.8, 7.8], [10.2, 1.2], [0.8, 2.8], [6.2, 8.8], [9.8, 2.2] ]) # Optional: Standardize features scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # K-Means Clustering (k=3) kmeans = KMeans(n_clusters=3, init='k-means++', n_init=10, random_state=42) labels = kmeans.fit_predict(X_scaled) print(f"Centroids (scaled):\n{kmeans.cluster_centers_}") print(f"Inertia: {kmeans.inertia_:.4f}") print(f"Iterations: {kmeans.n_iter_}") # Silhouette Score sil = silhouette_score(X_scaled, labels) print(f"Silhouette Score: {sil:.4f}") # Cluster sizes for i in range(3): print(f"Cluster {i}: {sum(labels == i)} points") # Elbow Method inertias = [] K_range = range(1, 9) for k in K_range: km = KMeans(n_clusters=k, init='k-means++', n_init=10, random_state=42) km.fit(X_scaled) inertias.append(km.inertia_) fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Elbow plot axes[0].plot(K_range, inertias, 'bo-') axes[0].set_xlabel('Number of Clusters (k)') axes[0].set_ylabel('Inertia') axes[0].set_title('Elbow Method') axes[0].grid(True, alpha=0.3) # Cluster visualization (first 2 dimensions) colors = plt.cm.Set1(labels / max(labels.max(), 1)) axes[1].scatter(X_scaled[:, 0], X_scaled[:, 1], c=colors, s=50, alpha=0.7) axes[1].scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='black', marker='X', s=200, edgecolors='white', linewidths=2) axes[1].set_xlabel('Feature 1') axes[1].set_ylabel('Feature 2') axes[1].set_title(f'K-Means Clusters (k=3)') axes[1].grid(True, alpha=0.3) plt.tight_layout() plt.show()