Apache Spark Tutorials


RDD lineage in Spark: ToDebugString Method

1. Objective Basically, in Spark all the dependencies between the RDDs will be logged in a graph, despite the actual data. This is what we call as a lineage graph in Spark. This document holds the concept of RDD lineage in Spark logical execution plan. Moreover, we will get to know that how to get RDD Lineage Graph by toDebugString method in detail. Before all, let’s also learn about Spark RDDs. 2. Introduction to Spark RDD Spark RDD is nothing but an acronym […]

Spark RDD Lineage - Introduction

Spark Interview Question

Top 100 Apache Spark Interview Questions and Answers

1. Spark Interview Questions As we know Apache Spark is a booming technology nowadays. Hence it is very important to know each and every aspect of Apache Spark as well as Spark Interview Questions. So, this blog will definitely help you regarding same.  In this blog, we will cover each and every aspect of Spark, which may also be possible frequently asked Spark Interview Questions. Moreover, we will try our best to provide each Question, that from now onwards your […]


Spark GraphX Features – An Introductory Guide

1. Objective There are several features of Spark GraphX which enhances its qualities. Hence, in this blog, we will learn GraphX features in Apache Spark. Before Spark GraphX features, we will start with the brief introduction of GraphX. Afterwards, we will learn all features in detail. 2. What is Spark GraphX? For graphs and graph-parallel computation, we have GraphX API in Spark. It leverages an advantage of growing collection of graph algorithms.  Also includes Graph builders to simplify graph analytics […]


GraphX API in Apache Spark: An Introductory Guide

1. Objective For graphs and graph-parallel computation, Apache Spark has an additional API, GraphX. In this blog, we will learn the whole concept of GraphX API in Spark. We will also learn how to import Spark and GraphX into the project. Moreover, we will understand the concept of Property Graph. Also, we will cover graph operators and Pregel API in detail. In addition, we will also learn features of GraphX. Furthermore, we will also see Use cases of GraphX API. […]


Spark Tutorial – Learn Spark Programming

1. Objective In this Spark Tutorial, we will see an overview of Spark Big Data. We will start with an introduction to Apache Spark Programming. Then we will move to know the Spark History. Moreover, we will learn why Spark is needed. Afterwards, will cover all fundamental of Spark components. Furthermore, we will learn about Spark’s core abstraction and Spark RDD. For more detailed insights, we will also cover spark features, Spark limitations and Spark Use cases.  2. Introduction to Spark Programming […]


SparkR DataFrame and DataFrame Operations

1. Objective In this article, we will learn the whole concept of SparkR DataFrame. Further, we will also learn SparkR DataFrame Operations and how to run SQL queries from SparkR. 2. SparkR DataFrame Data is organized as a distributed collection of data into named columns. That we call on SparkDataFrame. Basically, it is as same as a table in a relational database or a data frame in R. Moreover, we can construct a DataFrame from a wide array of sources. […]


Featurization in Apache Spark MLlib Algorithms

1. Objective In this blog, we will learn a tool Featurization in Apache Spark MLlib. We will also learn spark Machine Learning Algorithms to understand well. 2. Featurization in Apache Spark MLlib Apache Spark MLlib includes algorithms for working with Spark features. Moreover, it divided into these groups: Extraction: Extracting features from “raw” data. Transformation: Scaling, converting, or modifying features. Selection: Selecting a subset of a larger set of features. Locality Sensitive Hashing (LSH): This class of algorithms combines aspects of feature transformation […]

Featurization in Spark MLlib

Apache Spark Executor for Executing Spark Tasks

Apache Spark Executor for Executing Spark Tasks

1. Objective In Apache Spark, some distributed agent is responsible for executing tasks, this agent is what we call Spark Executor. This document aims the whole concept of Apache Spark Executor. Also, we will see the method to create executor instance in Spark. To learn in depth, we will also see launch task method in Spark Executor. 2. Introduction to Spark Executor Basically, we can say Executors in Spark are worker nodes. Those help to process in charge of running […]


Spark Stage- An Introduction to Physical Execution plan

1. Objective A stage is nothing but a step in a physical execution plan. Moreover, It is a physical unit of the execution plan. This document aims the whole concept of Apache Spark Stage. Also, we will learn the types of Stages in Spark. However, Spark stages are of two types. Such as ShuffleMapstage in Spark and ResultStage in spark. There is also a method to create Spark Stage, We will also learn that in detail. 2. Stages in Spark […]

Spark Stage - Task and submitting jobs

Spark Paired RDD - Spark RDD

Introduction to Apache Spark Paired RDD

1. Objective In Apache Spark, key-value pairs are what we call as paired RDD. This Spark Paired RDD tutorial aims the information on what are paired RDDs in Spark. We will also learn following methods of creating spark paired RDD and operations on paired RDDs in spark. Such as transformations and actions in Spark RDD. Here transformation operations are groupByKey, reduceByKey, join, leftOuterJoin/rightOuterJoin. Whereas actions like countByKey. However initially, we will learn brief introduction on Spark RDDs. 2. What is Spark […]