Apache Spark


Spark GraphX Features – An Introductory Guide   Recently updated !

1. Objective There are several features of Spark GraphX which enhances its qualities. Hence, in this blog, we will learn GraphX features in Apache Spark. Before Spark GraphX features, we will start with the brief introduction of GraphX. Afterwards, we will learn all features in detail. 2. What is Spark GraphX? For graphs and graph-parallel computation, we have GraphX API in Spark. It leverages an advantage of growing collection of graph algorithms.  Also includes Graph builders to simplify graph analytics […]


GraphX API in Apache Spark: An Introductory Guide   Recently updated !

1. Objective For graphs and graph-parallel computation, Apache Spark has an additional API, GraphX. In this blog, we will learn the whole concept of GraphX API in Spark. We will also learn how to import Spark and GraphX into the project. Moreover, we will understand the concept of Property Graph. Also, we will cover graph operators and Pregel API in detail. In addition, we will also learn features of GraphX. Furthermore, we will also see Use cases of GraphX API. […]


SparkR DataFrame and DataFrame Operations   Recently updated !

1. Objective In this article, we will learn the whole concept of SparkR DataFrame. Further, we will also learn SparkR DataFrame Operations and how to run SQL queries from SparkR. 2. SparkR DataFrame Data is organized as a distributed collection of data into named columns. That we call on SparkDataFrame. Basically, it is as same as a table in a relational database or a data frame in R. Moreover, we can construct a DataFrame from a wide array of sources. […]


Spark Stage - Task and submitting jobs

Spark Stage- An Introduction to Physical Execution plan   Recently updated !

1. Objective A stage is nothing but a step in a physical execution plan. Moreover, It is a physical unit of the execution plan. This document aims the whole concept of Apache Spark Stage. Also, we will learn the types of Stages in Spark. However, Spark stages are of two types. Such as ShuffleMapstage in Spark and ResultStage in spark. There is also a method to create Spark Stage, We will also learn that in detail. 2. Stages in Spark […]


Introduction to Apache Spark Paired RDD   Recently updated !

1. Objective In Apache Spark, key-value pairs are what we call as paired RDD. This Spark Paired RDD tutorial aims the information on what are paired RDDs in Spark. We will also learn following methods of creating spark paired RDD and operations on paired RDDs in spark. Such as transformations and actions in Spark RDD. Here transformation operations are groupByKey, reduceByKey, join, leftOuterJoin/rightOuterJoin. Whereas actions like countByKey. However initially, we will learn brief introduction on Spark RDDs. 2. What is Spark […]

Spark Paired RDD - Spark RDD

SparkDataFrame in SparkR - Creating SparkDataFrame

Ways to Create SparkDataFrames in SparkR   Recently updated !

1. Objective Data is organized as a distributed collection of data into named columns. Basically, that we call a SparkDataFrames in SparkR. Also, there are following ways to create DataFrames in sparkR. In this article, we will learn the whole concept of creating DataFrames in SparkR. 2. What is SparkDataFrames Data is organized as a distributed collection of data into named columns. Basically, that we call a SparkDataFrame. Although, it is as same as a table in a relational database or […]


Introduction to Structured Streaming in SparkR   Recently updated !

1. Objective Basically, SparkR supports Structured Streaming API. In this article, we will learn the whole concept of Structured Streaming in spark R. Moreover, we will also learn the programming model for Structured Streaming, to understand it better. 2. Introduction to Structured Streaming in SparkR Basically, SparkR supports Structured Streaming API. It is built on the Spark SQL engine, which is scalable and fault-tolerant in nature. Although, as same as we express a batch computation on static data. In the same way, we […]

Introduction of Spark Structured streaming in R

SparkR in Apache Spark: An Introductory Guide   Recently updated !

1. Objective Basically, an R package that provides a light-weight frontend to use Apache Spark from R is what we call SparkR. In this article, we will learn the whole concept of SparkR. At first, we will start with SparkR introduction. Afterwards, we will learn the process of creating SparkR DataFrames. Moreover, we will also learn some SparkR DataFrame operations. Further, we will learn MLLib algorithms exposed by SparkR. 2. Introduction to SparkR SparkR is nothing but an R package. […]


Spark SQL Features You must know

1. Objective In this document, we will see various shining Spark SQL features. There are many features Like Unified Data Access, High Compatibility and many more. We will focus on each feature in detail. But, before learning features of Spark SQL, we will also study brief introduction to Spark SQL. 2. Introduction to Spark SQL In Apache Spark, Spark SQL is a module for working with structured data. Spark SQL supports distributed in-memory computations on a huge scale. It divulges […]

Spark SQL Features

Apache Hive vs Spark SQL: Feature wise comparison

1. Objective While Apache Hive and Spark SQL perform the same action, retrieving data, each does the task in a different way. However, Hive is planned as an interface or convenience for querying data stored in HDFS. Though, MySQL is planned for online operations requiring many reads and writes. So we will discuss Apache Hive vs Spark SQL on the basis of their feature. This blog totally aims at differences between Spark SQL vs Hive in Apache Spark. We will […]