Daily Archives: July 26, 2016

Apache Storm vs Spark Streaming – Feature wise Comparison 13

1. Objective This tutorial will cover the comparison between Apache Storm vs Spark streaming. Apache Storm is the stream processing engine for processing real-time streaming data. While Apache Spark is general purpose computing engine. It provides Spark Streaming to handle streaming data. It process data in near real-time. Let’s understand which is better in the battle of Spark vs storm. 2. Apache Storm vs Spark Streaming Comparison The following description shows the detailed feature wise difference between Apache Storm vs Spark […]

Run Wordcount in Eclipse using Apache Flink Tutorial Training

Apache Flink – Run Wordcount program in Eclipse 1

1. Objective In our previous guides, we discussed how to install Apache Flink on ubuntu. In this tutorial, we will understand how to develop and run Apache Flink wordcount program in Java in eclipse. We can also use Scala language to write wordcount program in Apache Flink. To learn Scala get the best Scala Books from here. 2. Platform Operating system: You can run the code in Windows / Mac / Linux Java 7.x or higher Eclipse – Latest version 3. Steps to […]

Spark RDD – Introduction, Features & Operations of RDD

1. Spark RDD RDD (Resilient Distributed Dataset) is the fundamental data structure of Apache Spark which are an immutable collection of objects which computes on the different node of the cluster. Each and every dataset in Spark RDD is logically partitioned across many servers so that they can be computed on different nodes of the cluster. In this blog, we are going to get to know about what is RDD in Apache Spark. What are the features of RDD, What […]

Limitations of Apache Spark RDD

Apache Flink Installation Setup Tutorial Training

Install and Run Apache Flink on Windows 5

1. Objective In this Apache Flink installation on windows tutorial, we will learn how to install Apache Flink on Windows. Apache Flink can be run on Windows as well as Linux. Here in this blog, we will see how to install Apache Flink on Windows on single node cluster mode and how can we run wordcount program. You can also refer how to install Apache Flink on ubuntu. 2. Apache Flink Installation on Windows 2.1. Platform I. Platform Requirements Operating […]

Important Linux Commands Tutorial Part-II 1

1. Objective This important Linux commands tutorial lists 34 most frequently used Linux commands which are useful for beginners while getting familiar with Linux, Linux commands and terminal. To install Ubuntu Linux refer Ubuntu installation guide 2. Important Linux Commands Tutorial Top 34 frequently used Linux commands are given below In this section of Important Linux Commands Tutorial along with their usage- 2.1. gzip a. Usage: $ gzip filename It compresses content of files, gives extension of .gz and needs to […]

Introduction to Apache Spark RDD Operations - Transformtions and Actions API with Example.

Spark RDD Operations-Transformation & Action with Example 12

1. Spark RDD Operations Two types of Apache Spark RDD operations are- Transformations and Actions. A Transformation is a function that produces new RDD from the existing RDDs but when we want to work with the actual dataset, at that point Action is performed. When the action is triggered after the result, new RDD is not formed like transformation. In this Apache Spark RDD operations tutorial we will get the detailed view of what is Spark RDD, what is the transformation […]