# Spark Machine Learning with R: An Introductory Guide

Boost your career with Free Big Data Courses!!

## Â 1. Objective

Today, in this Spark tutorial, we will learn several SparkR Machine Learning algorithms supported by Spark. Such as Classification, Regression, Tree, Clustering, Collaborative Filtering, Frequent Pattern Mining, Statistics, and Model persistence. we will learn all these in detail. Moreover, we will learn aÂ few examples to understand Spark Machine Learning with R in a better way.

So, let’s start Spark machine Learning with R.

Spark Machine Learning with R: An Introductory Guide

## 2. Spark Machine Learning with R

The following Spark machine learning algorithms using R supports currently are,

a. Machine LearningÂ Classification
spark.logit: Logistic Regression
spark.mlp: Multilayer Perceptron (MLP)
spark.naiveBayes: Naive Bayes
spark.svmLinear: Linear Support Vector Machine

### b. Machine LearningÂ Regression

spark.survreg: Accelerated Failure Time (AFT) Survival Model
spark.glm or glm: Generalized Linear Model (GLM)
spark.isoreg: Isotonic Regression

Let’sÂ have a look at Apache Spark Machine LearningÂ Algorithm

### c.Â Machine LearningÂ Tree

spark.gbt: Gradient Boosted Trees for Regression and Classification
spark.randomForest: Random Forest for Regression and Classification

### d. Machine LearningÂ Clustering

spark.bisectingKmeans: Bisecting k-means
spark.gaussianMixture: Gaussian Mixture Model (GMM)
spark.kmeans: K-Means
spark.lda: Latent Dirichlet Allocation (LDA)

### e. Machine LearningÂ Collaborative Filtering

spark.als: Alternating Least Squares (ALS)
Frequent Pattern Mining
spark.fpGrowth : FP-growth

### f. StatisticalÂ Machine Learning

Technology is evolving rapidly!

spark.kstest: Kolmogorov-Smirnov Test
Basically, SparkR uses MLlib to train the model. Moreover, it supports a subset of the available R formula operators. For example, model fitting, including â€˜~â€™, â€˜.â€™, â€˜:â€™, â€˜+â€™, and â€˜-â€˜.

### g. Model persistence inÂ Machine Learning

Here, below example shows how to save/load an MLlib model by SparkR.
For example,
training <- read.df(“data/mllib/sample_multiclass_classification_data.txt”, source = “libsvm”)
# Fit a generalized linear model of family “gaussian” with spark.glm
df_list <- randomSplit(training, c(7,3), 2)
gaussianDF <- df_list[[1]]
gaussianTestDF <- df_list[[2]]
gaussianGLM <- spark.glm(gaussianDF, label ~ features, family = “gaussian”)

Let’s discuss Data Types in Spark Machine Learning

# Save and then load a fitted MLlib model
modelPath <- tempfile(pattern = “ml”, fileext = “.tmp”)
write.ml(gaussianGLM, modelPath)

# Check model summary
summary(gaussianGLM2)

# Check model prediction
gaussianPredictions <- predict(gaussianGLM2, gaussianTestDF)