Daily Archives: June 13, 2016


What is Apache Spark – A Quick Guide to Drift in Spark 11

1. Objective In this Apache Spark tutorial, we will have a brief look at What is Apache Spark, What is the history of Spark? Apache Spark is an advanced analytics engine which can easily process real-time data. It is an in-memory processing framework which is efficient and much faster as compared to others like MapReduce. This tutorial will also cover ecosystem of Spark, Features of Apache Spark and industries those are using Apache Spark for day by day data operations. […]


Interact with HDFS using CLI & Perform Various Operations Part-IV 3

1. Objective In this HDFS tutorial, we are going to learn the remaining important and frequently used HDFS commands using CLI, with the help of which we will be able to perform HDFS file operations like copying a file, changing files permissions, viewing the file contents, changing files ownership, creating directories, etc. To learn more about world’s most reliable storage layer follow this HDFS introductory guide Before interacting with HDFS you need to Deploy Hadoop follow this detailed tutorial to Install and configure […]


Hadoop Commands with Examples and Usage Part-III 1

1. Hadoop Commands – Objective In this HDFS Hadoop commands tutorial, we are going to learn the remaining important and frequently used HDFS commands with the help of which we will be able to perform HDFS file operations like copying a file, changing files permissions, viewing the file contents, changing files ownership, creating directories, etc. To learn more about world’s most reliable storage layer follow this HDFS introductory guide. Looking to learn HDFS, follow these detailed tutorials: HDFS High Availability HDFS Fault […]


Hadoop HDFS Commands with Examples and Usage – Part II 3

1. HDFS Commands – Objective In this Hadoop HDFS Commands tutorial, we are going to learn the remaining important and frequently used Hadoop commands with the help of which we will be able to perform HDFS file operations like copying a file, changing files permissions, viewing the file contents, changing files ownership, creating directories, etc. To learn more about world’s most reliable storage layer follow this HDFS introductory guide. 2. Hadoop HDFS Commands Hadoop file system shell commands are used to […]


Top 10 Hadoop HDFS Commands with Examples and Usage 6

1. Hadoop HDFS Commands In this tutorial, we are going to learn the most important and frequently used Hadoop HDFS commands with the help of which we will be able to perform HDFS file operations like copying the file, changing files permissions, viewing the file contents, changing files ownership, creating directories, etc. In this Hadoop Commands tutorial we have mentioned the most frequently used HDFS commands. 2. Hadoop HDFS Commands: Introduction Hadoop HDFS is a distributed file system which provides redundant […]

HDFS Commands

Steps to Configure Hadoop CDH5 on Ubuntu

Install Hadoop 2 on Ubuntu 16.0.4 | Apache Hadoop Installation 3

1. Install Hadoop 2 on Ubuntu 16.0.4: Objective This document describes how to install Hadoop 2 Ubuntu 16.0.4 OS. Single machine Hadoop cluster is also called as Hadoop Pseudo-Distributed Mode. The steps and procedure given in this document to install Hadoop 2 on Ubuntu 16.0.4 and to install Hadoop cluster are very simple and to the point, so that you can install Hadoop very easily on Ubuntu 16.0.4 and within some minutes of time. Once the installation is done you can play […]


Understand HDFS Feature – Fault Tolerance 9

1. Definition Fault tolerance in HDFS refers to the working strength of a system in unfavourable conditions and how that system can handle such situation. HDFS is highly fault tolerant. It handles faults by the process of replica creation. The replica of users data is created on different machines in the HDFS cluster. So whenever if any machine in the cluster goes down, then data can be accessed from other machine in which same copy of data was created. HDFS […]

Fault Tolerance hdfs hadoop tutorial training

Hadoop High Availability – HDFS Feature 1

1. Overview In this Hadoop tutorial, we will discuss the Hadoop High Availability feature. The tutorial covers an introduction to Hadoop High Availability, how high availability is achieved in Hadoop, what were the issues in legacy systems, and examples of High Availability in Hadoop. Learn How to install and configure Hadoop on a single machine and multi-node cluster. 2. Hadoop High Availability – Introduction HDFS is a distributed file system. It distributes data among the nodes in the cluster by creating a replica […]


Hadoop MapReduce Flow – How data flows in MapReduce? 8

1. Objective Hadoop MapReduce processes a huge amount of data in parallel by dividing the job into a set of independent tasks (sub-job). In Hadoop, MapReduce works by breaking the processing into phases: Map and Reduce. In this tutorial, will explain you the complete Hadoop MapReduce flow. This MapReduce tutorial, will cover an end to end Hadoop MapReduce flow. Hope this blog will give you the answer for how Hadoop MapReduce works, how data is processed when a map-reduce job is submitted. […]

Steps of Hadoop MapReduce Flow..

HDFS Write Operation

Hadoop HDFS Data Read and Write Operations 9

1. Objective HDFS follow Write once Read many models. So we cannot edit files already stored in HDFS, but we can append data by reopening the file. In Read-Write operation client first, interact with the NameNode. NameNode provides privileges so, the client can easily read and write data blocks into/from the respective datanodes. In this blog, we will discuss the internals of Hadoop HDFS data read and write operations. We will also cover how client read and write the data from […]