

{"id":65722,"date":"2019-07-31T09:51:49","date_gmt":"2019-07-31T04:21:49","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=65722"},"modified":"2024-08-02T15:41:29","modified_gmt":"2024-08-02T10:11:29","slug":"r-data-science-project-customer-segmentation","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/","title":{"rendered":"Data Science Project &#8211; Customer Segmentation using Machine Learning in R"},"content":{"rendered":"<div class='__iawmlf-post-loop-links' style='display:none;' data-iawmlf-post-links='[{&quot;id&quot;:1236,&quot;href&quot;:&quot;https:\\\/\\\/drive.google.com\\\/file\\\/d\\\/19BOhwz52NUY3dg8XErVYglctpr5sjTy4\\\/view&quot;,&quot;archived_href&quot;:&quot;&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[],&quot;broken&quot;:false,&quot;last_checked&quot;:null,&quot;process&quot;:&quot;done&quot;}]'><\/div>\n<p><strong>Cluster <\/strong>In this Data Science R Project series, we will perform one of the most essential applications of machine learning &#8211; Customer Segmentation. In this project, we will implement customer segmentation in R. Whenever you need to find your best customer, customer segmentation is the ideal methodology.<\/p>\n<p>In this machine learning project, DataFlair will provide you the background of customer segmentation. Then we will explore the data upon which we will be building our segmentation model. Also, in this data science project, we will see the descriptive analysis of our data and then implement several versions of the K-means algorithm. So, follow the complete data science customer segmentation project using machine learning in R and become a pro in <a href=\"https:\/\/data-flair.training\/blogs\/data-science-tutorials-home\/\"><em><strong>Data Science<\/strong><\/em><\/a>.<\/p>\n<h2>Customer Segmentation Project in R<\/h2>\n<p>Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify the several segments of customers allowing them to target the potential user base. In this machine learning project, we will make use of <a href=\"https:\/\/data-flair.training\/blogs\/k-means-clustering-tutorial\/\"><em><strong>K-means clustering<\/strong><\/em><\/a> which is the essential algorithm for clustering unlabeled dataset. Before ahead in this project, learn what actually customer segmentation is.<\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-66049\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png\" alt=\"Data science project - customer segmentation\" width=\"801\" height=\"419\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png 801w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation-150x78.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation-300x157.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation-768x402.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation-520x272.png 520w\" sizes=\"auto, (max-width: 801px) 100vw, 801px\" \/><\/a><\/p>\n<h3>What is Customer Segmentation?<\/h3>\n<p><em>Customer Segmentation is the process of division of customer base into several groups of individuals that share a similarity in different ways that are relevant to marketing such as gender, age, interests, and miscellaneous spending habits.<\/em><\/p>\n<p>Companies that deploy customer segmentation are under the notion that every customer has different requirements and require a specific marketing effort to address them appropriately. Companies aim to gain a deeper approach of the customer they are targeting. Therefore, their aim has to be specific and should be tailored to address the requirements of each and every individual customer. Furthermore, through the data collected, companies can gain a deeper understanding of customer preferences as well as the requirements for discovering valuable segments that would reap them maximum profit. This way, they can strategize their marketing techniques more efficiently and minimize the possibility of risk to their investment.<\/p>\n<p>The technique of customer segmentation is dependent on several key differentiators that divide customers into groups to be targeted. Data related to demographics, geography, economic status as well as behavioral patterns play a crucial role in determining the company direction towards addressing the various segments.<\/p>\n<p>Furthermore, customer segmentation assists firms in avoiding generic messaging to their clientele since the goal is to take targeted approaches that shall be more appealing to the selected groups. Consequently, by separately identifying segments, the companies can also tailor their ways for producing and designing their products as well in order to cover the needs of each segment effectively as a result of the raised customer satisfaction and loyalty.<\/p>\n<p><strong><em>You can download the dataset for customer segmentation project <a href=\"https:\/\/drive.google.com\/file\/d\/19BOhwz52NUY3dg8XErVYglctpr5sjTy4\/view\">here<\/a>.\u00a0<\/em><\/strong><\/p>\n<h2>How to Implement Customer Segmentation in R?<\/h2>\n<p>In the first step of this data science project, we will perform data exploration. We will import the essential packages required for this role and then read our data. Finally, we will go through the input data to gain necessary insights about it.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">customer_data=read.csv(\"\/home\/dataflair\/Mall_Customers.csv\")\r\nstr(customer_data)\r\n\r\nnames(customer_data)<\/pre>\n<p><strong>Output Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/implementing-customer-segment-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65738\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/implementing-customer-segment-in-R.png\" alt=\"Ml project - create customer segmentation\" width=\"849\" height=\"423\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/implementing-customer-segment-in-R.png 849w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/implementing-customer-segment-in-R-150x75.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/implementing-customer-segment-in-R-300x149.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/implementing-customer-segment-in-R-768x383.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/implementing-customer-segment-in-R-520x259.png 520w\" sizes=\"auto, (max-width: 849px) 100vw, 849px\" \/><\/a><\/p>\n<p>We will now display the first six rows of our dataset using the head() function and use the summary() function to output summary of it.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">head(customer_data)\r\nsummary(customer_data$Age)<\/pre>\n<p><strong>Output Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/head-function-ml-project.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65740\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/head-function-ml-project.png\" alt=\"data science project - using head function\" width=\"689\" height=\"374\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/head-function-ml-project.png 689w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/head-function-ml-project-150x81.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/head-function-ml-project-300x163.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/head-function-ml-project-520x282.png 520w\" sizes=\"auto, (max-width: 689px) 100vw, 689px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">sd(customer_data$Age)\r\nsummary(customer_data$Annual.Income..k..)\r\nsd(customer_data$Annual.Income..k..)\r\nsummary(customer_data$Age)<\/pre>\n<p><strong>Output Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/summary-function-R-project.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65741\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/summary-function-R-project.png\" alt=\"R project - using summary function\" width=\"558\" height=\"474\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/summary-function-R-project.png 558w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/summary-function-R-project-150x127.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/summary-function-R-project-300x255.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/summary-function-R-project-520x442.png 520w\" sizes=\"auto, (max-width: 558px) 100vw, 558px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">sd(customer_data$Spending.Score..1.100.)<\/pre>\n<p><strong>Output Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/data-exploration-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65742\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/data-exploration-in-R.png\" alt=\"R data exploration\" width=\"415\" height=\"104\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/data-exploration-in-R.png 415w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/data-exploration-in-R-150x38.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/data-exploration-in-R-300x75.png 300w\" sizes=\"auto, (max-width: 415px) 100vw, 415px\" \/><\/a><\/p>\n<p><em><strong>Have you Checked DataFlair&#8217;s Trending Project on Data Science? Must Check &#8211; <a href=\"https:\/\/data-flair.training\/blogs\/data-science-r-sentiment-analysis-project\/\">Sentiment Analysis using R<\/a><\/strong><\/em><\/p>\n<h3>Customer Gender Visualization<\/h3>\n<p>In this, we will create a barplot and a piechart to show the gender distribution across our customer_data dataset.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">a=table(customer_data$Gender)\r\nbarplot(a,main=\"Using BarPlot to display Gender Comparision\",\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ylab=\"Count\",\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0xlab=\"Gender\",\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0col=rainbow(2),\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0legend=rownames(a))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65743\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-in-R.png\" alt=\"machine learning project - R visualization\" width=\"617\" height=\"158\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-in-R.png 617w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-in-R-150x38.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-in-R-300x77.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-in-R-520x133.png 520w\" sizes=\"auto, (max-width: 617px) 100vw, 617px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Gender-visualization-Output-Plot.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65744\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Gender-visualization-Output-Plot.png\" alt=\"data science project in R - visualization\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Gender-visualization-Output-Plot.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Gender-visualization-Output-Plot-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Gender-visualization-Output-Plot-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Gender-visualization-Output-Plot-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Gender-visualization-Output-Plot-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Gender-visualization-Output-Plot-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p>From the above barplot, we observe that the number of females is higher than the males. Now, let us visualize a pie chart to observe the ratio of male and female distribution.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">pct=round(a\/sum(a)*100)\r\nlbs=paste(c(\"Female\",\"Male\"),\" \",pct,\"%\",sep=\" \")\r\nlibrary(plotrix)\r\npie3D(a,labels=lbs,\r\n   main=\"Pie Chart Depicting Ratio of Female and Male\")<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-pie-chart-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65745\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-pie-chart-in-R.png\" alt=\"data visualization - ML project\" width=\"606\" height=\"132\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-pie-chart-in-R.png 606w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-pie-chart-in-R-150x33.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-pie-chart-in-R-300x65.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-pie-chart-in-R-520x113.png 520w\" sizes=\"auto, (max-width: 606px) 100vw, 606px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-pie-chart-visualization.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-65746\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-pie-chart-visualization.png\" alt=\"R pie chart visualization\" width=\"442\" height=\"315\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-pie-chart-visualization.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-pie-chart-visualization-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-pie-chart-visualization-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-pie-chart-visualization-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-pie-chart-visualization-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-pie-chart-visualization-520x371.png 520w\" sizes=\"auto, (max-width: 442px) 100vw, 442px\" \/><\/a><\/p>\n<p>From the above graph, we conclude that the percentage of females is <strong>56%<\/strong>, whereas the percentage of male in the customer dataset is <strong>44%<\/strong>.<\/p>\n<h3>Visualization of Age Distribution<\/h3>\n<p>Let us plot a histogram to view the distribution to plot the frequency of customer ages. We will first proceed by taking summary of the Age variable.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">summary(customer_data$Age)<\/pre>\n<p><strong>Output Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-histogram.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65747\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-histogram.png\" alt=\"R project - histogram visualization\" width=\"519\" height=\"120\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-histogram.png 519w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-histogram-150x35.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-a-histogram-300x69.png 300w\" sizes=\"auto, (max-width: 519px) 100vw, 519px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hist(customer_data$Age,\r\n    col=\"blue\",\r\n    main=\"Histogram to Show Count of Age Class\",\r\n    xlab=\"Age Class\",\r\n    ylab=\"Frequency\",\r\n    labels=TRUE)<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-of-age-distribution.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65751\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-of-age-distribution.png\" alt=\"R project - data visualization\" width=\"485\" height=\"150\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-of-age-distribution.png 485w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-of-age-distribution-150x46.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualization-of-age-distribution-300x93.png 300w\" sizes=\"auto, (max-width: 485px) 100vw, 485px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram-plot-in-ML.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65752\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram-plot-in-ML.png\" alt=\"histogram plot in ML\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram-plot-in-ML.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram-plot-in-ML-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram-plot-in-ML-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram-plot-in-ML-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram-plot-in-ML-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram-plot-in-ML-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">boxplot(customer_data$Age,\r\n       col=\"ff0066\",\r\n       main=\"Boxplot for Descriptive Analysis of Age\")<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/boxplot-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65753\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/boxplot-in-R.png\" alt=\"R project - boxplot visualization\" width=\"558\" height=\"89\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/boxplot-in-R.png 558w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/boxplot-in-R-150x24.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/boxplot-in-R-300x48.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/boxplot-in-R-520x83.png 520w\" sizes=\"auto, (max-width: 558px) 100vw, 558px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ML-boxplot-of-descriptive-analysis.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65754\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ML-boxplot-of-descriptive-analysis.png\" alt=\"boxplot in ML\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ML-boxplot-of-descriptive-analysis.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ML-boxplot-of-descriptive-analysis-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ML-boxplot-of-descriptive-analysis-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ML-boxplot-of-descriptive-analysis-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ML-boxplot-of-descriptive-analysis-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ML-boxplot-of-descriptive-analysis-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p>From the above two visualizations, we conclude that the maximum customer ages are between 30 and 35. The minimum age of customers is 18, whereas, the maximum age is 70.<\/p>\n<p><em><strong>Don&#8217;t forget to practice the\u00a0<a href=\"https:\/\/data-flair.training\/blogs\/data-science-machine-learning-project-credit-card-fraud-detection\/\">Credit Card Fraud Detection Project<\/a> of Machine Learning<\/strong><\/em><\/p>\n<h3>Analysis of the Annual Income of the Customers<\/h3>\n<p>In this section of the R project, we will create visualizations to analyze the annual income of the customers. We will plot a histogram and then we will proceed to examine this data using a density plot.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">summary(customer_data$Annual.Income..k..)\r\nhist(customer_data$Annual.Income..k..,\r\n  col=\"#660033\",\r\n  main=\"Histogram for Annual Income\",\r\n  xlab=\"Annual Income Class\",\r\n  ylab=\"Frequency\",\r\n  labels=TRUE)<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-annual-income.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65766\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-annual-income.png\" alt=\"visualizing annual income in R\" width=\"537\" height=\"282\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-annual-income.png 537w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-annual-income-150x79.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-annual-income-300x158.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-annual-income-520x273.png 520w\" sizes=\"auto, (max-width: 537px) 100vw, 537px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ml-histogram-income-plot.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65767\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ml-histogram-income-plot.png\" alt=\"ml histogram income plot\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ml-histogram-income-plot.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ml-histogram-income-plot-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ml-histogram-income-plot-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ml-histogram-income-plot-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ml-histogram-income-plot-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/ml-histogram-income-plot-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">plot(density(customer_data$Annual.Income..k..),\r\n    col=\"yellow\",\r\n    main=\"Density Plot for Annual Income\",\r\n    xlab=\"Annual Income Class\",\r\n    ylab=\"Density\")\r\npolygon(density(customer_data$Annual.Income..k..),\r\n        col=\"#ccff66\")<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-customer-income-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65768\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-customer-income-in-R.png\" alt=\"data visualization in R\" width=\"504\" height=\"169\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-customer-income-in-R.png 504w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-customer-income-in-R-150x50.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/visualizing-customer-income-in-R-300x101.png 300w\" sizes=\"auto, (max-width: 504px) 100vw, 504px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-plot-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65769\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-plot-in-R.png\" alt=\"R project - income plot \" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-plot-in-R.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-plot-in-R-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-plot-in-R-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-plot-in-R-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-plot-in-R-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-plot-in-R-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p>From the above descriptive analysis, we conclude that the minimum annual income of the customers is 15 and the maximum income is 137. People earning an average income of 70 have the highest frequency count in our histogram distribution. The average salary of all the customers is 60.56. In the Kernel Density Plot that we displayed above, we observe that the annual income has a <em><strong><a href=\"https:\/\/data-flair.training\/blogs\/normal-distribution-in-r\/\">normal distribution<\/a><\/strong><\/em>.<\/p>\n<h2>Analyzing Spending Score of the Customers<\/h2>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">summary(customer_data$Spending.Score..1.100.)\r\n\r\nMin. 1st Qu. Median Mean 3rd Qu. Max. \r\n## 1.00 34.75 50.00 50.20 73.00 99.00\r\n\r\nboxplot(customer_data$Spending.Score..1.100.,\r\n\u00a0 \u00a0horizontal=TRUE,\r\n\u00a0 \u00a0col=\"#990000\",\r\n\u00a0 \u00a0main=\"BoxPlot for Descriptive Analysis of Spending Score\")<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/analyzing-customer-income-spending-score.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65770\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/analyzing-customer-income-spending-score.png\" alt=\"data science customer segmentation project\" width=\"647\" height=\"246\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/analyzing-customer-income-spending-score.png 647w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/analyzing-customer-income-spending-score-150x57.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/analyzing-customer-income-spending-score-300x114.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/analyzing-customer-income-spending-score-520x198.png 520w\" sizes=\"auto, (max-width: 647px) 100vw, 647px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-plot.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65771\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-plot.png\" alt=\"data science project - income spending plot\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-plot.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-plot-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-plot-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-plot-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-plot-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-plot-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hist(customer_data$Spending.Score..1.100.,\r\n    main=\"HistoGram for Spending Score\",\r\n    xlab=\"Spending Score Class\",\r\n    ylab=\"Frequency\",\r\n    col=\"#6600cc\",\r\n    labels=TRUE)<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-boxplot-input.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65772\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-boxplot-input.png\" alt=\"income spending boxplot input\" width=\"432\" height=\"151\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-boxplot-input.png 432w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-boxplot-input-150x52.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/income-spending-boxplot-input-300x105.png 300w\" sizes=\"auto, (max-width: 432px) 100vw, 432px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram_output.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65773\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram_output.png\" alt=\"histogram_output\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram_output.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram_output-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram_output-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram_output-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram_output-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/histogram_output-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p>The minimum spending score is 1, maximum is 99 and the average is 50.20. We can see Descriptive Analysis of Spending Score is that Min is 1, Max is 99 and avg. is 50.20. From the histogram, we conclude that customers between class 40 and 50 have the highest spending score among all the classes.<\/p>\n<h2>K-means Algorithm<\/h2>\n<p>While using the k-means clustering algorithm, the first step is to indicate the number of clusters (k) that we wish to produce in the final output. The algorithm starts by selecting k objects from dataset randomly that will serve as the initial centers for our clusters. These selected objects are the cluster means, also known as centroids. Then, the remaining objects have an assignment of the closest centroid. This centroid is defined by the Euclidean Distance present between the object and the cluster mean. We refer to this step as \u201ccluster assignment\u201d. When the assignment is complete, the algorithm proceeds to calculate new mean value of each cluster present in the data. After the recalculation of the centers, the observations are checked if they are closer to a different cluster. Using the updated cluster mean, the objects undergo reassignment. This goes on repeatedly through several iterations until the cluster assignments stop altering. The clusters that are present in the current iteration are the same as the ones obtained in the previous iteration.<\/p>\n<p><em><strong>If you want to work one of the major challenges then knowledge Big Data is crucial. Therefore, I recommend to check out <a href=\"https:\/\/data-flair.training\/blogs\/hadoop-for-data-science\/\">Hadoop for Data Science<\/a>.<\/strong><\/em><\/p>\n<p>Summing up the K-means clustering &#8211;<\/p>\n<ul>\n<li>We specify the number of clusters that we need to create.<\/li>\n<li>The algorithm selects k objects at random from the dataset. This object is the initial cluster or mean.<\/li>\n<li>The closest centroid obtains the assignment of a new observation. We base this assignment on the Euclidean Distance between object and the centroid.<\/li>\n<li>k clusters in the data points update the centroid through calculation of the new mean values present in all the data points of the cluster. The kth cluster\u2019s centroid has a length of p that contains means of all variables for observations in the k-th cluster. We denote the number of variables with p.<\/li>\n<li>Iterative minimization of the total within the sum of squares. Then through the iterative minimization of the total sum of the square, the assignment stop wavering when we achieve maximum iteration. The default value is 10 that the R software uses for the maximum iterations.<\/li>\n<\/ul>\n<h3>Determining Optimal Clusters<\/h3>\n<p>While working with clusters, you need to specify the number of clusters to use. You would like to utilize the optimal number of clusters. To help you in determining the optimal clusters, there are three popular methods &#8211;<\/p>\n<ul>\n<li>Elbow method<\/li>\n<li>Silhouette method<\/li>\n<li>Gap statistic<\/li>\n<\/ul>\n<h4>Elbow Method<\/h4>\n<p>The main goal behind cluster partitioning methods like k-means is to define the clusters such that the intra-cluster variation stays minimum.<\/p>\n<p style=\"text-align: center\"><strong>minimize(sum W(Ck)), k=1&#8230;k<\/strong><\/p>\n<p>Where Ck represents the kth cluster and W(Ck) denotes the intra-cluster variation. With the measurement of the total intra-cluster variation, one can evaluate the compactness of the clustering boundary. We can then proceed to define the optimal clusters as follows &#8211;<\/p>\n<p>First, we calculate the clustering algorithm for several values of k. This can be done by creating a variation within k from 1 to 10 clusters. We then calculate the total intra-cluster sum of square (iss). Then, we proceed to plot iss based on the number of k clusters. This plot denotes the appropriate number of clusters required in our model. In the plot, the location of a bend or a knee is the indication of the optimum number of clusters. Let us implement this in R as follows &#8211;<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">library(purrr)\r\nset.seed(123)\r\n# function to calculate total intra-cluster sum of square \r\niss &lt;- function(k) {\r\n  kmeans(customer_data[,3:5],k,iter.max=100,nstart=100,algorithm=\"Lloyd\" )$tot.withinss\r\n}\r\n\r\nk.values &lt;- 1:10\r\n\r\n\r\niss_values &lt;- map_dbl(k.values, iss)\r\n\r\nplot(k.values, iss_values,\r\n    type=\"b\", pch = 19, frame = FALSE, \r\n    xlab=\"Number of clusters K\",\r\n    ylab=\"Total intra-clusters sum of squares\")<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65776\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-in-R.png\" alt=\"K-Means Elbow in R\" width=\"868\" height=\"402\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-in-R.png 868w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-in-R-150x69.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-in-R-300x139.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-in-R-768x356.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-in-R-520x241.png 520w\" sizes=\"auto, (max-width: 868px) 100vw, 868px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-graph-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65775\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-graph-in-R.png\" alt=\"K-Means Elbow graph in R\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-graph-in-R.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-graph-in-R-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-graph-in-R-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-graph-in-R-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-graph-in-R-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-Elbow-graph-in-R-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p>From the above graph, we conclude that 4 is the appropriate number of clusters since it seems to be appearing at the bend in the elbow plot.<\/p>\n<p><em><strong>Want to be the next Data Scientist? Follow DataFlair&#8217;s guide design by industry experts to <a href=\"https:\/\/data-flair.training\/blogs\/steps-to-become-a-data-scientist\/\">become a Data Scientist easily<\/a><\/strong><\/em><\/p>\n<h4>Average Silhouette Method<\/h4>\n<p>With the help of the average silhouette method, we can measure the quality of our clustering operation. With this, we can determine how well within the cluster is the data object. If we obtain a high average silhouette width, it means that we have good clustering. The average silhouette method calculates the mean of silhouette observations for different k values. With the optimal number of k clusters, one can maximize the average silhouette over significant values for k clusters.<\/p>\n<p>Using the silhouette function in the cluster package, we can compute the average silhouette width using the kmean function. Here, the optimal cluster will possess highest average.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">library(cluster) \r\nlibrary(gridExtra)\r\nlibrary(grid)\r\n\r\n\r\nk2&lt;-kmeans(customer_data[,3:5],2,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns2&lt;-plot(silhouette(k2$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-in-machine-learning.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-65774 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-in-machine-learning.png\" alt=\"Average Silhouette Method in k-means clustering\" width=\"745\" height=\"176\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-in-machine-learning.png 745w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-in-machine-learning-150x35.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-in-machine-learning-300x71.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-in-machine-learning-520x123.png 520w\" sizes=\"auto, (max-width: 745px) 100vw, 745px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-ML.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65777\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-ML.png\" alt=\"K-Means silhoutte graph in ML\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-ML.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-ML-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-ML-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-ML-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-ML-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-ML-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k3&lt;-kmeans(customer_data[,3:5],3,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns3&lt;-plot(silhouette(k3$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-in-R-input.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65778\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-in-R-input.png\" alt=\"K-Means silhoutte in R input\" width=\"723\" height=\"66\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-in-R-input.png 723w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-in-R-input-150x14.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-in-R-input-300x27.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-in-R-input-720x66.png 720w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-in-R-input-520x47.png 520w\" sizes=\"auto, (max-width: 723px) 100vw, 723px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65779\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-R.png\" alt=\"K-Means silhoutte graph in R\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-R.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-R-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-R-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-R-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-R-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-in-R-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k4&lt;-kmeans(customer_data[,3:5],4,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns4&lt;-plot(silhouette(k4$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65782\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-3.png\" alt=\"data science project - k-means\" width=\"768\" height=\"62\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-3.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-3-150x12.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-3-300x24.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-3-520x42.png 520w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65785\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-3.png\" alt=\"K-Means silhoutte graph\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-3.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-3-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-3-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-3-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-3-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-3-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k5&lt;-kmeans(customer_data[,3:5],5,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns5&lt;-plot(silhouette(k5$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65788\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-4.png\" alt=\"K-Means silhoutte method in R\" width=\"760\" height=\"62\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-4.png 760w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-4-150x12.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-4-300x24.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-4-520x42.png 520w\" sizes=\"auto, (max-width: 760px) 100vw, 760px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-clustering.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65789\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-clustering.png\" alt=\"K-Means silhoutte graph clustering\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-clustering.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-clustering-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-clustering-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-clustering-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-clustering-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-clustering-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k6&lt;-kmeans(customer_data[,3:5],6,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns6&lt;-plot(silhouette(k6$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65792\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-5.png\" alt=\"K-Means silhoutte input 5\" width=\"765\" height=\"66\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-5.png 765w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-5-150x13.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-5-300x26.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-5-520x45.png 520w\" sizes=\"auto, (max-width: 765px) 100vw, 765px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65793\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-5.png\" alt=\"K-Means silhoutte\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-5.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-5-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-5-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-5-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-5-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-5-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k7&lt;-kmeans(customer_data[,3:5],7,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns7&lt;-plot(silhouette(k7$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65795\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-6.png\" alt=\"R project\" width=\"727\" height=\"61\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-6.png 727w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-6-150x13.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-6-300x25.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-6-720x61.png 720w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-6-520x44.png 520w\" sizes=\"auto, (max-width: 727px) 100vw, 727px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65797\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-6.png\" alt=\"K-Means silhoutte graph\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-6.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-6-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-6-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-6-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-6-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-6-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k8&lt;-kmeans(customer_data[,3:5],8,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns8&lt;-plot(silhouette(k8$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-7.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65798\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-7.png\" alt=\"machine learning project\" width=\"735\" height=\"60\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-7.png 735w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-7-150x12.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-7-300x24.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-7-720x60.png 720w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-7-520x42.png 520w\" sizes=\"auto, (max-width: 735px) 100vw, 735px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-7.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65799\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-7.png\" alt=\"K-Means silhoutte in R\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-7.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-7-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-7-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-7-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-7-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-7-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k9&lt;-kmeans(customer_data[,3:5],9,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns9&lt;-plot(silhouette(k9$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-8.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65804\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-8.png\" alt=\"customer segmentation project\" width=\"740\" height=\"66\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-8.png 740w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-8-150x13.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-8-300x27.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-8-520x46.png 520w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-8.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65805\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-8.png\" alt=\"K-Means silhoutte\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-8.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-8-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-8-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-8-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-8-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-8-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k10&lt;-kmeans(customer_data[,3:5],10,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\ns10&lt;-plot(silhouette(k10$cluster,dist(customer_data[,3:5],\"euclidean\")))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-9.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65806\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-9.png\" alt=\"customer segmentation project\" width=\"772\" height=\"62\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-9.png 772w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-9-150x12.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-9-300x24.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-9-768x62.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-input-9-520x42.png 520w\" sizes=\"auto, (max-width: 772px) 100vw, 772px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-9.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65807\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-9.png\" alt=\"K-Means silhoutte graph\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-9.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-9-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-9-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-9-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-9-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/K-Means-silhoutte-graph-9-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p>Now, we make use of the fviz_nbclust() function to determine and visualize the optimal number of clusters as follows &#8211;<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">library(NbClust)\r\nlibrary(factoextra)\r\n\r\nfviz_nbclust(customer_data[,3:5], kmeans, method = \"silhouette\")<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-in-R-clustering.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65808\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-in-R-clustering.png\" alt=\"np function in R clustering\" width=\"861\" height=\"252\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-in-R-clustering.png 861w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-in-R-clustering-150x44.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-in-R-clustering-300x88.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-in-R-clustering-768x225.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-in-R-clustering-520x152.png 520w\" sizes=\"auto, (max-width: 861px) 100vw, 861px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-graph-in-data-science-clustering.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65809\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-graph-in-data-science-clustering.png\" alt=\"np function graph in data science clustering\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-graph-in-data-science-clustering.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-graph-in-data-science-clustering-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-graph-in-data-science-clustering-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-graph-in-data-science-clustering-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-graph-in-data-science-clustering-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/np-function-graph-in-data-science-clustering-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<h4>Gap Statistic Method<\/h4>\n<p>In 2001, researchers at Stanford University &#8211; <strong>R. Tibshirani, G.Walther and T. Hastie<\/strong> published the Gap Statistic Method. We can use this method to any of the clustering method like K-means, hierarchical clustering etc. Using the gap statistic, one can compare the total intracluster variation for different values of k along with their expected values under the null reference distribution of data. With the help of <strong>Monte Carlo simulations<\/strong>, one can produce the sample dataset. For each variable in the dataset, we can calculate the range between min(xi) and max (xj) through which we can produce values uniformly from interval lower bound to upper bound.<\/p>\n<p>For computing the gap statistics method we can utilize the clusGap function for providing gap statistic as well as standard error for a given output.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">set.seed(125)\r\nstat_gap &lt;- clusGap(customer_data[,3:5], FUN = kmeans, nstart = 25,\r\n            K.max = 10, B = 50)\r\nfviz_gap_stat(stat_gap)<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-code-in-R-project.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65814\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-code-in-R-project.png\" alt=\"data science customer segmentation project\" width=\"672\" height=\"128\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-code-in-R-project.png 672w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-code-in-R-project-150x29.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-code-in-R-project-300x57.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-code-in-R-project-520x99.png 520w\" sizes=\"auto, (max-width: 672px) 100vw, 672px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-graph-in-ml.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65815\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-graph-in-ml.png\" alt=\"R project\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-graph-in-ml.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-graph-in-ml-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-graph-in-ml-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-graph-in-ml-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-graph-in-ml-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-graph-in-ml-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><em><strong>Learn everything about Machine Learning for Free &#8211; Check <a href=\"https:\/\/data-flair.training\/blogs\/category\/machine-learning\/\">90+ Free Machine Learning Tutorials<\/a>\u00a0<\/strong><\/em><\/p>\n<p>Now, let us take k = 6 as our optimal cluster &#8211;<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">k6&lt;-kmeans(customer_data[,3:5],6,iter.max=100,nstart=50,algorithm=\"Lloyd\")\r\nk6<\/pre>\n<p><strong>Output Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-input-k-6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65820\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-input-k-6.png\" alt=\"data science project\" width=\"832\" height=\"442\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-input-k-6.png 832w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-input-k-6-150x80.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-input-k-6-300x159.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-input-k-6-768x408.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/fviz_gap_stat-function-input-k-6-520x276.png 520w\" sizes=\"auto, (max-width: 832px) 100vw, 832px\" \/><\/a><\/p>\n<p>In the output of our kmeans operation, we observe a list with several key information. From this, we conclude the useful information being &#8211;<\/p>\n<ul>\n<li><strong>cluster &#8211;<\/strong> This is a vector of several integers that denote the cluster which has an allocation of each point.<\/li>\n<li><strong>totss &#8211;<\/strong> This represents the total sum of squares.<\/li>\n<li><strong>centers &#8211;<\/strong> Matrix comprising of several cluster centers<\/li>\n<li><strong>withinss &#8211;<\/strong> This is a vector representing the intra-cluster sum of squares having one component per cluster.<\/li>\n<li><strong>tot.withinss &#8211;<\/strong> This denotes the total intra-cluster sum of squares.<\/li>\n<li><strong>betweenss &#8211;<\/strong> This is the sum of between-cluster squares.<\/li>\n<li><strong>size &#8211;<\/strong> The total number of points that each cluster holds.<\/li>\n<\/ul>\n<h2>Visualizing the Clustering Results using the First Two Principle Components<\/h2>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">pcclust=prcomp(customer_data[,3:5],scale=FALSE) #principal component analysis\r\nsummary(pcclust)\r\n\r\npcclust$rotation[,1:2]<\/pre>\n<p><strong>Output Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/clustering-in-R.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65821\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/clustering-in-R.png\" alt=\"clustering in R\" width=\"807\" height=\"393\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/clustering-in-R.png 807w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/clustering-in-R-150x73.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/clustering-in-R-300x146.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/clustering-in-R-768x374.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/clustering-in-R-520x253.png 520w\" sizes=\"auto, (max-width: 807px) 100vw, 807px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">set.seed(1)\r\nggplot(customer_data, aes(x =Annual.Income..k.., y = Spending.Score..1.100.)) + \r\n  geom_point(stat = \"identity\", aes(color = as.factor(k6$cluster))) +\r\n  scale_color_discrete(name=\" \",\r\n              breaks=c(\"1\", \"2\", \"3\", \"4\", \"5\",\"6\"),\r\n              labels=c(\"Cluster 1\", \"Cluster 2\", \"Cluster 3\", \"Cluster 4\", \"Cluster 5\",\"Cluster 6\")) +\r\n  ggtitle(\"Segments of Mall Customers\", subtitle = \"Using K-means Clustering\")<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-R-.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65822\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-R-.png\" alt=\"PCA Cluster in R\" width=\"841\" height=\"222\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-R-.png 841w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-R--150x40.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-R--300x79.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-R--768x203.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-R--520x137.png 520w\" sizes=\"auto, (max-width: 841px) 100vw, 841px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65823\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML.png\" alt=\"PCA Cluster Graph in ML\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p>From the above visualization, we observe that there is a distribution of 6 clusters as follows &#8211;<\/p>\n<p><strong>Cluster 6 and 4 &#8211;<\/strong> These clusters represent the customer_data with the medium income salary as well as the medium annual spend of salary.<\/p>\n<p><strong>Cluster 1 &#8211;<\/strong> This cluster represents the customer_data having a high annual income as well as a high annual spend.<\/p>\n<p><strong>3rd Cluster &#8211;<\/strong> This cluster denotes the customer_data with low annual income as well as low yearly spend of income.<\/p>\n<p><strong>Cluster 2 &#8211;<\/strong> This cluster denotes a high annual income and low yearly spend.<\/p>\n<p><strong>Cluster 5 &#8211;<\/strong> This cluster represents a low annual income but its high yearly expenditure.<\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">ggplot(customer_data, aes(x =Spending.Score..1.100., y =Age)) + \r\n  geom_point(stat = \"identity\", aes(color = as.factor(k6$cluster))) +\r\n  scale_color_discrete(name=\" \",\r\n                      breaks=c(\"1\", \"2\", \"3\", \"4\", \"5\",\"6\"),\r\n                      labels=c(\"Cluster 1\", \"Cluster 2\", \"Cluster 3\", \"Cluster 4\", \"Cluster 5\",\"Cluster 6\")) +\r\n  ggtitle(\"Segments of Mall Customers\", subtitle = \"Using K-means Clustering\")<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-ML-Input.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65824\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-ML-Input.png\" alt=\"clustering in machine learning\" width=\"834\" height=\"172\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-ML-Input.png 834w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-ML-Input-150x31.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-ML-Input-300x62.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-ML-Input-768x158.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-in-ML-Input-520x107.png 520w\" sizes=\"auto, (max-width: 834px) 100vw, 834px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65825\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1.png\" alt=\"customer segmentation in R\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-ML-1-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">kCols=function(vec){cols=rainbow (length (unique (vec)))\r\nreturn (cols[as.numeric(as.factor(vec))])}\r\n\r\ndigCluster&lt;-k6$cluster; dignm&lt;-as.character(digCluster); # K-means clusters\r\n\r\nplot(pcclust$x[,1:2], col =kCols(digCluster),pch =19,xlab =\"K-means\",ylab=\"classes\")\r\nlegend(\"bottomleft\",unique(dignm),fill=unique(kCols(digCluster)))<\/pre>\n<p><strong>Screenshot:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Input-in-data-science.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65826\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Input-in-data-science.png\" alt=\"PCA Cluster Input in data science\" width=\"831\" height=\"171\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Input-in-data-science.png 831w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Input-in-data-science-150x31.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Input-in-data-science-300x62.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Input-in-data-science-768x158.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Input-in-data-science-520x107.png 520w\" sizes=\"auto, (max-width: 831px) 100vw, 831px\" \/><\/a><\/p>\n<p><strong>Output:<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-data-science.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-65827\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-data-science.png\" alt=\"customer segmentation data science project\" width=\"1344\" height=\"960\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-data-science.png 1344w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-data-science-150x107.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-data-science-300x214.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-data-science-768x549.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-data-science-1024x731.png 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/PCA-Cluster-Graph-in-data-science-520x371.png 520w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><\/p>\n<p><strong>Cluster 4 and 1 &#8211;<\/strong> These two clusters consist of customers with medium PCA1 and medium PCA2 score.<\/p>\n<p><strong>Cluster 6 &#8211;<\/strong> This cluster represents customers having a high PCA2 and a low PCA1.<\/p>\n<p><strong>5th Cluster &#8211;<\/strong> In this cluster, there are customers with a medium PCA1 and a low PCA2 score.<\/p>\n<p><strong>Cluster 3 &#8211;<\/strong> This cluster comprises of customers with a high PCA1 income and a high PCA2.<\/p>\n<p><strong>Cluster 2 &#8211;<\/strong> This comprises of customers with a high PCA2 and a medium annual spend of income.<\/p>\n<p>With the help of clustering, we can understand the variables much better, prompting us to take careful decisions. With the identification of customers, companies can release products and services that target customers based on several parameters like income, age, spending patterns, etc. Furthermore, more complex patterns like product reviews are taken into consideration for better segmentation.<\/p>\n<h2>Summary<\/h2>\n<p>In this data science project, we went through the customer segmentation model. We developed this using a class of machine learning known as unsupervised learning. Specifically, we made use of a clustering algorithm called K-means clustering. We analyzed and visualized the data and then proceeded to implement our algorithm. Hope you enjoyed this customer segmentation project of machine learning using R.<\/p>\n<p><em><strong>Are there any other Data Science Project on which you have worked on? Do share your experience with us through comments. Here is DataFlair&#8217;s next project for data science enthusiasts &#8211; <a href=\"https:\/\/data-flair.training\/blogs\/r-data-science-project-uber-data-analysis\/\">Uber Data Analysis Project<\/a>.\u00a0<\/strong><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cluster In this Data Science R Project series, we will perform one of the most essential applications of machine learning &#8211; Customer Segmentation. In this project, we will implement customer segmentation in R. Whenever&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":66049,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[51],"tags":[20752,20584,20697,20541],"class_list":["post-65722","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-r","tag-customer-segmentation-project","tag-data-science-project","tag-machine-learning-project","tag-r-project"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Science Project - Customer Segmentation using Machine Learning in R - DataFlair<\/title>\n<meta name=\"description\" content=\"This machine learning project of customer segmentation in R will help find your potential customers &amp; learn important data science concepts\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Science Project - Customer Segmentation using Machine Learning in R - DataFlair\" \/>\n<meta property=\"og:description\" content=\"This machine learning project of customer segmentation in R will help find your potential customers &amp; learn important data science concepts\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2019-07-31T04:21:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-08-02T10:11:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png\" \/>\n\t<meta property=\"og:image:width\" content=\"801\" \/>\n\t<meta property=\"og:image:height\" content=\"419\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"22 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Science Project - Customer Segmentation using Machine Learning in R - DataFlair","description":"This machine learning project of customer segmentation in R will help find your potential customers & learn important data science concepts","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/","og_locale":"en_US","og_type":"article","og_title":"Data Science Project - Customer Segmentation using Machine Learning in R - DataFlair","og_description":"This machine learning project of customer segmentation in R will help find your potential customers & learn important data science concepts","og_url":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2019-07-31T04:21:49+00:00","article_modified_time":"2024-08-02T10:11:29+00:00","og_image":[{"width":801,"height":419,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png","type":"image\/png"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"22 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Data Science Project &#8211; Customer Segmentation using Machine Learning in R","datePublished":"2019-07-31T04:21:49+00:00","dateModified":"2024-08-02T10:11:29+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/"},"wordCount":2318,"commentCount":6,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png","keywords":["customer segmentation project","data science project","machine learning project","R project"],"articleSection":["R Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/","url":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/","name":"Data Science Project - Customer Segmentation using Machine Learning in R - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png","datePublished":"2019-07-31T04:21:49+00:00","dateModified":"2024-08-02T10:11:29+00:00","description":"This machine learning project of customer segmentation in R will help find your potential customers & learn important data science concepts","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/R-project-customer-segmentation.png","width":801,"height":419,"caption":"Data science project - customer segmentation"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/r-data-science-project-customer-segmentation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"R Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/r\/"},{"@type":"ListItem","position":3,"name":"Data Science Project &#8211; Customer Segmentation using Machine Learning in R"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/65722","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=65722"}],"version-history":[{"count":12,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/65722\/revisions"}],"predecessor-version":[{"id":143141,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/65722\/revisions\/143141"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/66049"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=65722"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=65722"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=65722"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}