{"id":5790,"date":"2018-01-16T14:03:11","date_gmt":"2018-01-16T14:03:11","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=5790"},"modified":"2018-09-18T10:59:05","modified_gmt":"2018-09-18T05:29:05","slug":"spark-machine-learning-algorithm","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/","title":{"rendered":"Apache Spark Machine Learning Algorithm &#8211; Example &amp; Clustering"},"content":{"rendered":"<h2>1. Objective &#8211; Spark Machine Learning<\/h2>\n<p>Today, in this Spark Tutorial, we will see the concept of Spark Machine Learning. Moreover, we will discuss each and every detail in the algorithms of Apache Spark Machine Learning. Also, we will learn about MLlib, statistics in Machine learning algorithms with Spark. Along with this, we will see regression, classification, and filtering in the Spark Machine Learning Algorithm. At last, we are going to discuss the important term that is clustering in Machine Learning.<\/p>\n<p><span style=\"font-weight: 400\">MLlib is Spark\u2019s scalable machine learning library consisting of common machine learning algorithms in spark. For example, basic statistics, classification, regression, clustering, collaborative filtering.\u00a0<\/span><\/p>\n<p>So, let&#8217;s start to spark Machine Learning tutorial.<\/p>\n<div id=\"attachment_5797\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5797\" class=\"wp-image-5797 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01.jpg\" alt=\"Iintroduction of Machine Learning algorithm in Apache Spark\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-5797\" class=\"wp-caption-text\">Iintroduction of Machine Learning algorithm in Apache Spark<\/p><\/div>\n<h2>2. Machine Learning Algorithm (MLlib)<\/h2>\n<p><span style=\"font-weight: 400\">MLlib is nothing but a <a href=\"https:\/\/data-flair.training\/blogs\/machine-learning-tutorial\/\">machine learning<\/a> (ML) library of <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-for-beginners\/\"><strong>Apache Spark<\/strong><\/a>.<\/span> Basically, it helps to make practical machine learning scalable and easy. Moreover, it provides the following ML Algorithms:<\/p>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Basic statistics<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Classification and Regression<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Clustering<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Collaborative filtering<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400\">Furthermore, let\u2019s start discussing each Machine Learning algorithm one by one. <\/span><\/p>\n<h2>3. Spark Machine Learning Algorithm Statistics<\/h2>\n<p><span style=\"font-weight: 400\">This Machine Learning algorithm in spark consists of several algorithms, such as:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Summary statistics<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Correlations<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Stratified sampling <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Hypothesis testing<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Random data generation<\/span><\/li>\n<\/ul>\n<h3>a. Spark Machine Learning Algorithm &#8211;\u00a0Summary Statistics<\/h3>\n<p>Basically, f<span style=\"font-weight: 400\">or <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-rdd-tutorial\/\">RDD<\/a>[Vector] we offer column summary statistics.<\/span> Moreover, it is possible through the function colStats, available in statistics.<br \/>\n<span style=\"font-weight: 400\">In addition, f<\/span>unction colStats() returns an instance of MultivariateStatisticalSummary. That contains the column-wise max, min, mean, variance, and the number of nonzeros, as well as the total count.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.mllib.linalg.Vector\r\nimport org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, Statistics}\r\nval observations: RDD[Vector] = ... \/\/ an RDD of Vectors\r\n\/\/ Compute column summary statistics.\r\nval summary: MultivariateStatisticalSummary = Statistics.colStats(observations)\r\nprintln(summary.mean) \/\/ a dense vector containing the mean value for each column\r\nprintln(summary.variance) \/\/ column-wise variance\r\nprintln(summary.numNonzeros) \/\/ number of nonzeros in each column<\/pre>\n<h3>b. Spark Machine Learning\u00a0Algorithm &#8211;\u00a0Correlations<\/h3>\n<p><span style=\"font-weight: 400\">In statistics, calculating the correlation of two series is a common operation. Although, MLlib offers some flexibility. <span class=\"adverb\">Basically<\/span>, it helps to calculate pairwise correlations among many series. Moreover, it currently supports two correlation methods.\u00a0For example, Pearson\u2019s and Spearman\u2019s correlation.<\/span><br \/>\n<span style=\"font-weight: 400\">In addition, statistics offers methods to calculate correlations between series. However, it depends on the type of input either two <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-rdd-features\/\">RDD<\/a>[Double]s or an RDD[Vector]. Therefore, the output will be a double or the correlation matrix respectively.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.SparkContext\r\nimport org.apache.spark.mllib.linalg._\r\nimport org.apache.spark.mllib.stat.Statistics\r\nval sc: SparkContext = ...\r\nval seriesX: RDD[Double] = ... \/\/ a series\r\nval seriesY: RDD[Double] = ... \/\/ must have the same number of partitions and cardinality as seriesX\r\n\/\/ compute the correlation using Pearson's method. Enter \"spearman\" for Spearman's method. If a \r\n\/\/ method is not specified, Pearson's method will be used by default. \r\nval correlation: Double = Statistics.corr(seriesX, seriesY, \"pearson\")\r\nval data: RDD[Vector] = ... \/\/ note that each Vector is a row and not a column\r\n\/\/ calculate the correlation matrix using Pearson's method. Use \"spearman\" for Spearman's method.\r\n\/\/ If a method is not specified, Pearson's method will be used by default. \r\nval correlMatrix: Matrix = Statistics.corr(data, \"pearson\")<\/pre>\n<h3>c.\u00a0 Spark Machine Learning\u00a0Algorithm &#8211;\u00a0Stratified Sampling<\/h3>\n<p><span style=\"font-weight: 400\">Basically, stratified sampling methods, sampleByKey and sampleByKeyExact, can be performed on <a href=\"https:\/\/data-flair.training\/blogs\/create-rdds-in-apache-spark\/\">RDD\u2019s<\/a> of key-value pairs. However, for stratified sampling, we can consider keys as a label and the value as a specific attribute.<\/span><br \/>\n<span style=\"font-weight: 400\">Moreover, let\u2019s understand this by an example, consider the key\u00a0as a man or woman, or document ids. Whereas respective values can be the list of ages of the people in the population or the list of words in the documents. However, to decide whether an observation will be sampled or not, the sampleByKey method will flip a coin. Therefore requires one pass over the data. Also provides an expected sample size.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.SparkContext\r\nimport org.apache.spark.SparkContext._\r\nimport org.apache.spark.rdd.PairRDDFunctions\r\nval sc: SparkContext = ...\r\nval data = ... \/\/ an RDD[(K, V)] of any key value pairs\r\nval fractions: Map[K, Double] = ... \/\/ specify the exact fraction desired from each key\r\n\/\/ Get an exact sample from each stratum\r\nval approxSample = data.sampleByKey(withReplacement = false, fractions)\r\nval exactSample = data.sampleByKeyExact(withReplacement = false, fractions)<\/pre>\n<h3>d. Spark Machine Learning\u00a0Algorithm &#8211;\u00a0Hypothesis Testing<\/h3>\n<p><span style=\"font-weight: 400\">In statistics, we can determine whether a result is statistically significant, whether this result occurred by chance or not. Hence, to determine that a powerful tool is Hypothesis testing. Basically, MLlib currently supports Pearson\u2019s chi-squared (\u03c72\u03c72).\u00a0Although it tests for goodness of fit and independence. Moreover, the input data types determine whether the goodness of fit or the independence test is conducted. Also, The goodness of fit test requires an input type of vector, whereas the independence test requires a matrix as input.<\/span><br \/>\n<span style=\"font-weight: 400\">In addition, MLlib also supports the input type RDD[LabeledPoint]. Moreover, it helps to enable feature selection via chred independence tests.<\/span><br \/>\n<span style=\"font-weight: 400\">Moreover, statistics offers methods to run Pearson\u2019s chi-squared tests. Furthermore, the example below demonstrates how to run and interpret hypothesis tests.<\/span><br \/>\n<span style=\"font-weight: 400\">For Example<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.SparkContext\r\nimport org.apache.spark.mllib.linalg._\r\nimport org.apache.spark.mllib.regression.LabeledPoint\r\nimport org.apache.spark.mllib.stat.Statistics._\r\nval sc: SparkContext = ...\r\nval vec: Vector = ... \/\/ a vector composed of the frequencies of events\r\n\/\/ compute the goodness of fit. If a second vector to test against is not supplied as a parameter, \r\n\/\/ the test runs against a uniform distribution. \u00a0\r\nval goodnessOfFitTestResult = Statistics.chiSqTest(vec)\r\nprintln(goodnessOfFitTestResult) \/\/ summary of the test including the p-value, degrees of freedom, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\/\/ test statistic, the method used, and the null hypothesis.\r\nval mat: Matrix = ... \/\/ a contingency matrix\r\n\/\/ conduct Pearson's independence test on the input contingency matrix\r\nval independenceTestResult = Statistics.chiSqTest(mat) \r\nprintln(independenceTestResult) \/\/ summary of the test including the p-value, degrees of freedom...\r\nval obs: RDD[LabeledPoint] = ... \/\/ (feature, label) pairs.\r\n\/\/ The contingency table is constructed from the raw (feature, label) pairs and used to conduct\r\n\/\/ the independence test. Returns an array containing the ChiSquaredTestResult for every feature \r\n\/\/ against the label.\r\nval featureTestResults: Array[ChiSqTestResult] = Statistics.chiSqTest(obs)\r\nvar i = 1\r\nfeatureTestResults.foreach { result =&gt;\r\n\u00a0\u00a0\u00a0println(s\"Column $i:\\n$result\")\r\n\u00a0\u00a0\u00a0i += 1\r\n} \/\/ summary of the test<\/pre>\n<h3>e. Spark Machine Learning\u00a0Algorithm &#8211;\u00a0Random Data Generation<\/h3>\n<p><span style=\"font-weight: 400\">Basically, for randomized algorithms, prototyping, and performance testing, we use Random data generation. Moreover, MLlib also supports generating random RDDs with i.i.d. Values. However, drawn from a given distribution, either uniform, standard normal, or Poisson.<\/span><br \/>\n<span style=\"font-weight: 400\">In addition, to generate random double RDDs or vector RDDs, RandomRDDs offers factory methods. Let\u2019s understand this in the following example.\u00a0Moreover, it generates a random double RDD, whose values follow the standard normal distribution N(0, 1). Also map it to N(1, 4).<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.SparkContext\r\nimport org.apache.spark.mllib.random.RandomRDDs._\r\nval sc1: SparkContext = ...\r\n\/\/ Generate a random double RDD that contains 1 million i.i.d. values drawn from the\r\n\/\/ standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.\r\nval u1 = normalRDD(sc, 1000000L, 10)\r\n\/\/ Apply a transform to get a random double RDD following `N(1, 4)`.\r\nval v1 = u.map(x =&gt; 1.0 + 2.0 * x)<\/pre>\n<h2>4. Spark Machine Learning Algorithm &#8211;\u00a0Classification and Regression<\/h2>\n<h3>a. Classification\u00a0in Spark Machine Learning algorithm<\/h3>\n<h4>i. Logistic regression<\/h4>\n<p><span style=\"font-weight: 400\">To predict a categorical response, logistic regression is a popular method. Basically, it is a special case of Generalized Linear models. Also helps to predict the probability of the outcomes. Moreover, to predict a binary outcome by using binomial logistic regression, we can use logistic regression in spark.ml. Also, we can use it to predict a multiclass outcome by using multinomial logistic regression.<\/span><\/p>\n<h4>ii. Decision tree classifier<\/h4>\n<p><span style=\"font-weight: 400\">Basically, A popular family of classification and regression methods is decision trees. <\/span><br \/>\n<span style=\"font-weight: 400\">For Example<\/span><br \/>\n<span style=\"font-weight: 400\">In the below examples load a dataset in LibSVM format, split it into training and test sets, train on the first dataset, and then evaluate the held-out test set. Although, we use two feature transformers to prepare the data. Basically, these help index categories for the label and categorical features. Also, helps for adding metadata to the DataFrame which the decision tree algorithm can recognize.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.ml.Pipeline\r\nimport org.apache.spark.ml.classification.DecisionTreeClassificationModel\r\nimport org.apache.spark.ml.classification.DecisionTreeClassifier\r\nimport org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator\r\nimport org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer}\r\n\/\/ Load the data stored in LIBSVM format as a DataFrame.\r\nval data = spark.read.format(\"libsvm\").load(\"data\/mllib\/sample_libsvm_data.txt\")\r\n\/\/ Index labels, adding metadata to the label column.\r\n\/\/ Fit on whole dataset to include all labels in index.\r\nval labelIndexer = new StringIndexer()\r\n\u00a0.setInputCol(\"label\")\r\n\u00a0.setOutputCol(\"indexedLabel\")\r\n\u00a0.fit(data)\r\n\/\/ Automatically identify categorical features, and index them.\r\nval featureIndexer = new VectorIndexer()\r\n\u00a0.setInputCol(\"features\")\r\n\u00a0.setOutputCol(\"indexedFeatures\")\r\n\u00a0.setMaxCategories(4) \/\/ features with &gt; 4 distinct values are treated as continuous.\r\n\u00a0.fit(data)\r\n\/\/ Split the data into training and test sets (30% held out for testing).\r\nval Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))\r\n\/\/ Train a DecisionTree model.\r\nval dt = new DecisionTreeClassifier()\r\n\u00a0.setLabelCol(\"indexedLabel\")\r\n\u00a0.setFeaturesCol(\"indexedFeatures\")\r\n\/\/ Convert indexed labels back to original labels.\r\nval labelConverter = new IndexToString()\r\n\u00a0.setInputCol(\"prediction\")\r\n\u00a0.setOutputCol(\"predictedLabel\")\r\n\u00a0.setLabels(labelIndexer.labels)\r\n\/\/ Chain indexers and tree in a Pipeline.\r\nval pipeline = new Pipeline()\r\n\u00a0.setStages(Array(labelIndexer, featureIndexer, dt, labelConverter))\r\n\/\/ Train model. This also runs the indexers.\r\nval model = pipeline.fit(trainingData)\r\n\/\/ Make predictions.\r\nval predictions = model.transform(testData)\r\n\/\/ Select example rows to display.\r\npredictions.select(\"predictedLabel\", \"label\", \"features\").show(5)\r\n\/\/ Select (prediction, true label) and compute test error.\r\nval evaluator = new MulticlassClassificationEvaluator()\r\n\u00a0.setLabelCol(\"indexedLabel\")\r\n\u00a0.setPredictionCol(\"prediction\")\r\n\u00a0.setMetricName(\"accuracy\")\r\nval accuracy = evaluator.evaluate(predictions)\r\nprintln(\"Test Error = \" + (1.0 - accuracy))\r\nval treeModel = model.stages(2).asInstanceOf[DecisionTreeClassificationModel]\r\nprintln(\"Learned classification tree model:\\n\" + treeModel.toDebugString)<\/pre>\n<h3>b. Regression\u00a0in Spark Machine Learning algorithm<\/h3>\n<h4>i. Linear regression<\/h4>\n<p><span style=\"font-weight: 400\">Basically, for working with linear regression models and model summaries, the interface is similar to the logistic regression case.<\/span><br \/>\nExample for Regression\u00a0<b>in\u00a0Machine Learning algorithm<\/b><br \/>\nFor Example<br \/>\n<span style=\"font-weight: 400\">Moreover, Below example shows training an elastic net regularized linear regression model and extracting model summary statistics.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.ml.regression.LinearRegression\r\n\/\/ Load training data\r\nval training = spark.read.format(\"libsvm\")\r\n\u00a0.load(\"data\/mllib\/sample_linear_regression_data.txt\")\r\nval lr = new LinearRegression()\r\n\u00a0.setMaxIter(10)\r\n\u00a0.setRegParam(0.3)\r\n\u00a0.setElasticNetParam(0.8)\r\n\/\/ Fit the model\r\nval lrModel = lr.fit(training)\r\n\/\/ Print the coefficients and intercept for linear regression\r\nprintln(s\"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}\")\r\n\/\/ Summarize the model over the training set and print out some metrics\r\nval trainingSummary = lrModel.summary\r\nprintln(s\"numIterations: ${trainingSummary.totalIterations}\")\r\nprintln(s\"objectiveHistory: [${trainingSummary.objectiveHistory.mkString(\",\")}]\")\r\ntrainingSummary.residuals.show()\r\nprintln(s\"RMSE: ${trainingSummary.rootMeanSquaredError}\")\r\nprintln(s\"r2: ${trainingSummary.r2}\")<\/pre>\n<h2>5. Collaborative filtering\u00a0in Spark Machine Learning algorithm<\/h2>\n<p><span style=\"font-weight: 400\">For recommender systems, we commonly use collaborative filtering. Also, to fill in the missing entries of a user-item association matrix, we use these techniques. Although, spark.ml currently supports model-based collaborative filtering. \u00a0<\/span><br \/>\n<span style=\"font-weight: 400\">In addition, these filtering users and products are described by a small set of latent factors. Basically, those we can use to predict missing entries. Also, spark.ml uses the alternating least squares (ALS) algorithm. Moreover, it helps to learn these latent factors. There are following parameters of the implementation in MLlib:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Here, numBlocks is the number of blocks. It is used to parallelize computation (set to -1 to auto-configure).<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Moreover, rank is the number of latent factors in the model.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">And, iterations is the number of iterations to run.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">In ALS, lambda specifies the regularization parameter.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Moreover, implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Basically, alpha is a parameter applicable to the implicit feedback variant of ALS.\u00a0Also, helps to govern\u00a0the baseline confidence in preference observations.<\/span><\/li>\n<\/ol>\n<h3>a. Explicit vs. Implicit Feedback<\/h3>\n<p><span style=\"font-weight: 400\">Basically, the standard approach to matrix factorization-based collaborative filtering treats the entries in the user-item matrix as explicit preferences given by the user to the item.<\/span><br \/>\n<span style=\"font-weight: 400\">Although, in many real-world use cases, it is common to only have access to implicit feedback. For example, views, clicks, purchases, likes, shares and many more. Moreover, to deal with such data in MLlib, the approach used is taken from collaborative filtering for implicit feedback datasets. <\/span><\/p>\n<h3>b. Scaling of the regularization parameter<\/h3>\n<p><span style=\"font-weight: 400\">By the number of ratings the user generated in updating user factors, or the number of ratings the product received in updating product factors, we scale the regularization parameter lambda. Basically, this approach is named \u201cALS-WR\u201d. Moreover, it makes lambda less dependent on the scale of the dataset. Hence, we can apply the best parameter learned from a sampled subset to the full dataset.<\/span><\/p>\n<h3>c. Example for\u00a0Collaborative Filtering\u00a0in\u00a0Machine Learning Algorithm in Spark<\/h3>\n<p><span style=\"font-weight: 400\">For Example<\/span><br \/>\n<span style=\"font-weight: 400\">In the example below, we load rating data. Basically, each row consists of a user, a product, and a rating. Moreover, we will use the default ALS.train() method, that assumes ratings are explicit. Also, by measuring the Mean Squared Error of rating prediction we will evaluate the recommendation model.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.mllib.recommendation.ALS\r\nimport org.apache.spark.mllib.recommendation.Rating\r\n\/\/ Load and parse the data\r\nval data = sc.textFile(\"data\/mllib\/als\/test.data\")\r\nval ratings = data.map(_.split(',') match { case Array(user, item, rate) =&gt;\r\n\u00a0\u00a0\u00a0Rating(user.toInt, item.toInt, rate.toDouble)\r\n\u00a0})\r\n\/\/ Build the recommendation model using ALS\r\nval rank = 10\r\nval numIterations = 20\r\nval model = ALS.train(ratings, rank, numIterations, 0.01)\r\n\/\/ Evaluate the model on rating data\r\nval usersProducts = ratings.map { case Rating(user, product, rate) =&gt;\r\n\u00a0(user, product)\r\n}\r\nval predictions = \r\n\u00a0model.predict(usersProducts).map { case Rating(user, product, rate) =&gt; \r\n\u00a0\u00a0\u00a0((user, product), rate)\r\n\u00a0}\r\nval ratesAndPreds = ratings.map { case Rating(user, product, rate) =&gt; \r\n\u00a0((user, product), rate)\r\n}.join(predictions)\r\nval MSE = ratesAndPreds.map { case ((user, product), (r1, r2)) =&gt; \r\n\u00a0val err = (r1 - r2)\r\n\u00a0err * err\r\n}.mean()\r\nprintln(\"Mean Squared Error = \" + MSE)\r\nIn addition, we can use the trainImplicit method to get better results, if the rating matrix is derived from another source of information.\r\nval alpha = 0.01\r\nval model = ALS.trainImplicit(ratings, rank, numIterations, alpha)<\/pre>\n<h2>6. Clustering\u00a0in\u00a0Machine Learning algorithm in Spark<\/h2>\n<p><span style=\"font-weight: 400\">Basically, it is an unsupervised learning problem. Here we aim to group subsets of entities with one another on the basis of the notion of similarity. Moreover, we use clustering for exploratory analysis. Also, use it as a component of a hierarchical supervised learning pipeline. However, in that distinct classifiers or regression models are trained for each cluster.<\/span><br \/>\n<span style=\"font-weight: 400\">In addition, MLlib supports k-means clustering. However, it is the most commonly used clustering algorithms. Basically, it clusters the data points into the predefined number of clusters. Moreover, here MLlib implementation includes a parallelized variant of the k-means++ method called kmeans||. Moreover, there are following implementation parameters in MLlib:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Here, k is the number of desired clusters.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Although, maxIterations is the maximum number of iterations to run.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Moreover, initializationMode specifies either random initialization or initialization via k-means||.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Basically, runs is the number of times to run the k-means algorithm <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Further, to determines the number of steps in the k-means|| algorithm, we use initializationSteps.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Moreover, to determines the distance threshold within which we consider k-means to have converged we use epsilon.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400\">In spark-shell, it is possible to execute the following code snippets.<\/span><\/p>\n<h3>a. Example for Clustering\u00a0in\u00a0Machine Learning Algorithm in Spark<\/h3>\n<p><span style=\"font-weight: 400\">For Example<\/span><br \/>\nBasically, after loading and parsing data, we use the KMeans object in the example to cluster the data into two clusters. Moreover, the number of desired clusters <span class=\"passivevoice\">is passed<\/span> to the algorithm. Afterwards, we compute Within Set Sum of Squared Error (WSSSE). Also, can reduce this error measure by increasing k.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark.mllib.clustering.KMeans\r\nimport org.apache.spark.mllib.linalg.Vectors\r\n\/\/ Load and parse the data\r\nval data1 = sc.textFile(\"data1\/mllib\/kmeans_data.txt\")\r\nval parsedData1 = data.map(s =&gt; Vectors.dense(s.split(' ').map(_.toDouble)))\r\n\/\/ Cluster the data into two classes using KMeans\r\nval numClusters1 = 2\r\nval numIterations1 = 20\r\nval clusters1 = KMeans.train(parsedData1 numClusters1, numIterations1)\r\n\/\/ Evaluate clustering by computing Within Set Sum of Squared Errors\r\nval WSSSE1 = clusters1.computeCost(parsedData)\r\nprintln(\"Within Set Sum of Squared Errors = \" + WSSSE1)<\/pre>\n<p>So, this was all in Apache Spark Machine Learning Algorithm. Hope you like our explanation.<\/p>\n<h2>7. Conclusion<\/h2>\n<p>Hence, in this Spark Machine Learning tutorial, we have seen all the algorithms of <a href=\"https:\/\/data-flair.training\/blogs\/machine-learning-applications\/\">Machine Learning<\/a>\u00a0in Spark. Moreover, we have learned several examples to understand this well. Also, we discussed classifications, regressions, and clustering in Apache Spark Machine Learning. Still, if you\u00a0have any query, feel free to ask in the comment section. We will definitely get back to you.<br \/>\nBest books to <a href=\"https:\/\/data-flair.training\/blogs\/best-apache-spark-scala-books\/\">learn\u00a0Spark<\/a><br \/>\n<a href=\"https:\/\/spark.apache.org\/\">For reference<\/a><span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:2052,&quot;href&quot;:&quot;https:\\\/\\\/spark.apache.org&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251009215151\\\/https:\\\/\\\/spark.apache.org\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 00:11:34&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 03:24:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 05:06:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 07:19:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 14:10:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 19:03:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 13:05:23&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 13:25:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 14:08:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-09 10:16:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-12 11:04:53&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-15 17:09:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-18 18:39:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-21 19:15:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-26 04:14:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-29 05:32:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-01 07:55:30&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-04 10:44:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-07 12:28:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 00:52:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 12:51:24&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-17 14:17:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-20 17:49:34&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-24 04:42:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 06:25:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 08:44:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 10:27:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-08 11:13:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-11 12:04:06&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-14 12:32:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-18 01:16:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-21 21:29:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-25 06:37:35&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-28 07:59:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-31 10:36:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-04 11:16:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-07 18:11:02&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 05:09:37&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-14 06:26:10&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-18 15:58:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-22 11:10:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 06:59:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 12:38:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 15:24:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 17:05:30&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 12:07:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-14 23:33:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-19 11:27:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-23 02:59:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-29 05:05:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 06:55:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 20:59:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-08 05:37:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-11 15:39:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-14 16:52:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-18 01:16:02&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-22 04:29:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-25 16:10:03&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-29 06:57:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-07-02 07:09:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-07-07 02:05:47&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-07-10 04:47:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-07-13 04:57:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-07-16 23:55:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-07-20 04:32:16&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-07-20 04:32:16&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective &#8211; Spark Machine Learning Today, in this Spark Tutorial, we will see the concept of Spark Machine Learning. Moreover, we will discuss each and every detail in the algorithms of Apache Spark&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":6452,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[16602,2610,8432,8433,8748,16603,13087],"class_list":["post-5790","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-apache-spark-machine-learning","tag-clustering-in-machine-learning-algorithm","tag-machine-learning-algorithm","tag-machine-learning-algorithm-mllib","tag-mllib","tag-spark-machine-learning","tag-spark-mllib"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Spark Machine Learning Algorithm - Example &amp; Clustering - DataFlair<\/title>\n<meta name=\"description\" content=\"Spark Machine Learning algorithm,Statistics,Classification &amp; Regression in Machine Learning,Collaborative filtering &amp; Clustering in Spark ML algorithm,MLlib\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Spark Machine Learning Algorithm - Example &amp; Clustering - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Spark Machine Learning algorithm,Statistics,Classification &amp; Regression in Machine Learning,Collaborative filtering &amp; Clustering in Spark ML algorithm,MLlib\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-16T14:03:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-09-18T05:29:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Spark Machine Learning Algorithm - Example &amp; Clustering - DataFlair","description":"Spark Machine Learning algorithm,Statistics,Classification & Regression in Machine Learning,Collaborative filtering & Clustering in Spark ML algorithm,MLlib","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/","og_locale":"en_US","og_type":"article","og_title":"Apache Spark Machine Learning Algorithm - Example &amp; Clustering - DataFlair","og_description":"Spark Machine Learning algorithm,Statistics,Classification & Regression in Machine Learning,Collaborative filtering & Clustering in Spark ML algorithm,MLlib","og_url":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-01-16T14:03:11+00:00","article_modified_time":"2018-09-18T05:29:05+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-1.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Apache Spark Machine Learning Algorithm &#8211; Example &amp; Clustering","datePublished":"2018-01-16T14:03:11+00:00","dateModified":"2018-09-18T05:29:05+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/"},"wordCount":1652,"commentCount":1,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-1.jpg","keywords":["Apache Spark Machine Learning","Clustering in Machine Learning algorithm","Machine Learning algorithm","Machine Learning algorithm (MLlib)","mllib","Spark Machine Learning","Spark MLlib"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/","url":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/","name":"Apache Spark Machine Learning Algorithm - Example &amp; Clustering - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-1.jpg","datePublished":"2018-01-16T14:03:11+00:00","dateModified":"2018-09-18T05:29:05+00:00","description":"Spark Machine Learning algorithm,Statistics,Classification & Regression in Machine Learning,Collaborative filtering & Clustering in Spark ML algorithm,MLlib","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-1.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Machine-Learning-Algorithms-in-Spark-01-1.jpg","width":1200,"height":628,"caption":"Spark Machine Learning algorithm"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/spark-machine-learning-algorithm\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"Apache Spark Machine Learning Algorithm &#8211; Example &amp; Clustering"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5790","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=5790"}],"version-history":[{"count":5,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5790\/revisions"}],"predecessor-version":[{"id":34501,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5790\/revisions\/34501"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/6452"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=5790"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=5790"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=5790"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}