ML Project – Customer Segmentation Using K-Means Clustering

Machine Learning courses with 100+ Real-time projects Start Now!!

Program 1

Customer Segmentation Dataset

Customer Segmentation Dataset 1

# Librires
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Step 1: Generate Synthetic Dataset
np.random.seed(42) # It ensures that every time you run the code, you get the same random numbers.
n_customers = 200  # n_customers → number of values (here, 200 customers)
data = {
    'CustomerID': np.arange(1, n_customers + 1),
    'Annual Income (k$)': np.random.normal(60, 20, n_customers).astype(int), # 60 → mean (average annual income = $60k),20 → standard deviation (spread of incomes is +-$20k)
    'Spending Score (1-100)': np.random.randint(1, 101, n_customers)
}
df = pd.DataFrame(data)
df.shape

# Step 2: Save Dataset (Optional)
df.to_csv("D://scikit_data/KMeans/customer_segmentation.csv", index=False)

# Step 3: Prepare Features
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled

# Step 4: Apply K-Means Clustering
kmeans = KMeans(n_clusters=4, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_scaled)
df.to_csv("D://scikit_data/KMeans/customer_segmentation1.csv", index=False)

# Step 5: Plot Clusters with Centroids
plt.figure(figsize=(8, 6))
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'],
            c=df['Cluster'], cmap='viridis', s=50)
plt.scatter(scaler.inverse_transform(kmeans.cluster_centers_)[:, 0],
            scaler.inverse_transform(kmeans.cluster_centers_)[:, 1],
            s=200, c='red', marker='X', label='Centroids')
plt.title('Customer Segmentation Based on Spending Habits')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1–100)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

# kmeans.cluster_centers_
# These are the coordinates of the cluster centers — but in scaled form (because we normalized the data earlier).
# It returns an array like:
# [[ 0.56, -1.02],
#  [-0.83,  0.91],
#  ... ]
# scaler.inverse_transform(...)
# This undoes the scaling, converting the centroids back to their original values (real income and spending score).
# Now you can plot them in the same scale as the original data.
# [:, 0] and [:, 1]
# [:, 0] → all rows, column 0 → x-values (Annual Income)
# [:, 1] → all rows, column 1 → y-values (Spending Score)

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google

courses

DataFlair Team

DataFlair Team provides high-impact content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. We make complex concepts easy to grasp, helping learners of all levels succeed in their tech careers.

Leave a Reply

Your email address will not be published. Required fields are marked *