ML Project – Customer Segmentation Using K-Means Clustering

Machine Learning courses with 100+ Real-time projects Start Now!!

Program 1

Customer Segmentation Dataset

Customer Segmentation Dataset 1

# Librires
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Step 1: Generate Synthetic Dataset
np.random.seed(42) # It ensures that every time you run the code, you get the same random numbers.
n_customers = 200  # n_customers → number of values (here, 200 customers)
data = {
    'CustomerID': np.arange(1, n_customers + 1),
    'Annual Income (k$)': np.random.normal(60, 20, n_customers).astype(int), # 60 → mean (average annual income = $60k),20 → standard deviation (spread of incomes is +-$20k)
    'Spending Score (1-100)': np.random.randint(1, 101, n_customers)
}
df = pd.DataFrame(data)
df.shape

# Step 2: Save Dataset (Optional)
df.to_csv("D://scikit_data/KMeans/customer_segmentation.csv", index=False)

# Step 3: Prepare Features
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled

# Step 4: Apply K-Means Clustering
kmeans = KMeans(n_clusters=4, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_scaled)
df.to_csv("D://scikit_data/KMeans/customer_segmentation1.csv", index=False)

# Step 5: Plot Clusters with Centroids
plt.figure(figsize=(8, 6))
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'],
            c=df['Cluster'], cmap='viridis', s=50)
plt.scatter(scaler.inverse_transform(kmeans.cluster_centers_)[:, 0],
            scaler.inverse_transform(kmeans.cluster_centers_)[:, 1],
            s=200, c='red', marker='X', label='Centroids')
plt.title('Customer Segmentation Based on Spending Habits')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1–100)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

# kmeans.cluster_centers_
# These are the coordinates of the cluster centers — but in scaled form (because we normalized the data earlier).
# It returns an array like:
# [[ 0.56, -1.02],
#  [-0.83,  0.91],
#  ... ]
# scaler.inverse_transform(...)
# This undoes the scaling, converting the centroids back to their original values (real income and spending score).
# Now you can plot them in the same scale as the original data.
# [:, 0] and [:, 1]
# [:, 0] → all rows, column 0 → x-values (Annual Income)
# [:, 1] → all rows, column 1 → y-values (Spending Score)

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google

courses

DataFlair Team

DataFlair Team provides high-impact content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. We make complex concepts easy to grasp, helping learners of all levels succeed in their tech careers.

Leave a Reply

Your email address will not be published. Required fields are marked *