Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 6


1. Hadoop 2 Installation Tutorial: Objective

This Hadoop 2 Installation tutorial describes how to install and configure Hadoop cluster on a single-node on Ubuntu OS. Single Node Hadoop cluster is also called as “Hadoop Pseudo-Distributed Mode”. The Hadoop 2 installation is explained here very simply and to the point, so that you can learn Hadoop CDH5 Installation in 10 Min. Once the you install Hadoop 2 is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.

Looking to BOOST your career in the exciting field of Big Data, Learn Big Data and Hadoop from Experts.

Learn step by step Apache Hadoop 2 Installation on Ubuntu OS

Hadoop 2 Installation

2. Hadoop 2 Installation: Video Tutorial

This vedio tutorial covers Apache Hadoop 2 installation or Cloudera CDH5 installation on Ubuntu. This will help you to learn Hadoop CDH5 installation in an easy manner.

3. Install Hadoop 2 on Ubuntu

Follow the steps given below to install and configure Hadoop 2 cluster on ubuntu os-

3.1. Recommended Platform

  • OS – Linux is supported as a development and production platform. You can use Ubuntu 14.04 or later (you can also use other Linux flavors like CentOS, Redhat, etc.)
  • Hadoop – Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)

I. Setup Platform

If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

3.2. Prerequisites

I. Install Java 8 (Recommended Oracle Java)

a. Install Python Software Properties

sudo apt-get install python-software-properties

b. Add Repository

sudo add-apt-repository ppa:webupd8team/java

c. Update the source list

sudo apt-get update

d. Install Java

sudo apt-get install oracle-java8-installer

II. Configure SSH

a. Install Open SSH Server-Client

sudo apt-get install openssh-server openssh-client

b. Generate Key Pairs

ssh-keygen -t rsa -P ""

c. Configure password-less SSH

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

d. Check by SSH to localhost

ssh localhost

3.2. Install Hadoop

I. Download Hadoop 2

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz

II. Untar Tar ball

tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz

Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2).

III. Hadoop 2 Setup Configuration

a. Edit .bashrc

Now, edit .bashrc file located in user’s home directory and add following parameters:

export HADOOP_PREFIX="/home/hdadmin/hadoop-2.5.0-cdh5.3.2"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

Note: After above step restarts the terminal so that all the environment variables will come into effect.

b. Edit hadoop-env.sh

Now, edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-oracle/)

c. Edit core-site.xml

Now, edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/dataflair/hdata</value>
</property>
</configuration>

Note: /home/hdadmin/hdata is a sample location; please specify a location where you have Read Write privileges

d. Edit hdfs-site.xml

Now, edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

e. Edit mapred-site.xml

Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

f. Edit yarn-site.xml

Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

3.4. Start the Cluster

I. Format the name node

bin/hdfs namenode -format

NOTE: This activity should be done once when you install Hadoop, else It will delete all your data from HDFS.

II. Start HDFS Services

sbin/start-dfs.sh

III. Start YARN Services

sbin/start-yarn.sh

Follow this link to learn What is YARN?

IV. Check whether services have been started

jps
NameNode
DataNode
ResourceManager
NodeManager

3.5. Run Map-Reduce Jobs

I. Run word count example

 bin/hdfs dfs -mkdir /inputwords
bin/hdfs dfs -put <data-file> /inputwords
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.2.jar wordcount /inputwords /outputwords
bin/hdfs dfs -cat /outputwords/*

Follow HDFS command Guide to Play with HDFS Commands and perform various operations,

3.6. Stop The Cluster

I. Stop HDFS Services

sbin/stop-dfs.sh

II. Stop YARN Services

sbin/stop-yarn.sh

I hope now you are clear with how you do hadoop 2 installaion.

See Also-

As you install Hadoop 2, you can play with HDFS. For any query about Hadoop 2 Installation feel free to share with us. we will be happy to solve your queries.


6 thoughts on “Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5

  • Ankan

    I have tried the same and after starting(start-dfs.sh) i am just able to see 3 services running.
    NameNode
    DataNode
    SecondaryNameNode

    I am not able to see the below services:
    ResourceManager
    NodeManager

    Please help as i am new to hadoop and Linux system too.
    Thanks.

  • Jay

    tar (child): hadoop-2.5.0-cdh5.3.2.tar.gz: Cannot open: No such file or directory
    tar (child): Error is not recoverable: exiting now
    tar: Child returned status 2
    tar: Error is not recoverable: exiting now

Comments are closed.