ML Project – Customer Segmentation Using K-Means Clustering
Machine Learning courses with 100+ Real-time projects Start Now!!
Program 1
Customer Segmentation Dataset 1
# Librires
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Step 1: Generate Synthetic Dataset
np.random.seed(42) # It ensures that every time you run the code, you get the same random numbers.
n_customers = 200 # n_customers → number of values (here, 200 customers)
data = {
'CustomerID': np.arange(1, n_customers + 1),
'Annual Income (k$)': np.random.normal(60, 20, n_customers).astype(int), # 60 → mean (average annual income = $60k),20 → standard deviation (spread of incomes is +-$20k)
'Spending Score (1-100)': np.random.randint(1, 101, n_customers)
}
df = pd.DataFrame(data)
df.shape
# Step 2: Save Dataset (Optional)
df.to_csv("D://scikit_data/KMeans/customer_segmentation.csv", index=False)
# Step 3: Prepare Features
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled
# Step 4: Apply K-Means Clustering
kmeans = KMeans(n_clusters=4, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_scaled)
df.to_csv("D://scikit_data/KMeans/customer_segmentation1.csv", index=False)
# Step 5: Plot Clusters with Centroids
plt.figure(figsize=(8, 6))
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'],
c=df['Cluster'], cmap='viridis', s=50)
plt.scatter(scaler.inverse_transform(kmeans.cluster_centers_)[:, 0],
scaler.inverse_transform(kmeans.cluster_centers_)[:, 1],
s=200, c='red', marker='X', label='Centroids')
plt.title('Customer Segmentation Based on Spending Habits')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1–100)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# kmeans.cluster_centers_
# These are the coordinates of the cluster centers — but in scaled form (because we normalized the data earlier).
# It returns an array like:
# [[ 0.56, -1.02],
# [-0.83, 0.91],
# ... ]
# scaler.inverse_transform(...)
# This undoes the scaling, converting the centroids back to their original values (real income and spending score).
# Now you can plot them in the same scale as the original data.
# [:, 0] and [:, 1]
# [:, 0] → all rows, column 0 → x-values (Annual Income)
# [:, 1] → all rows, column 1 → y-values (Spending Score) We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google

