Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5

Boost your career with Free Big Data Courses!!

1. Hadoop 2 Installation Tutorial: Objective

This Hadoop 2 Installation tutorial describes how to install and configure Hadoop cluster on a single-node on Ubuntu OS. Single Node Hadoop cluster is also called as “Hadoop Pseudo-Distributed Mode”. The Hadoop 2 installation is explained here very simply and to the point, so that you can learn Hadoop CDH5 Installation in 10 Min. Once the you install Hadoop 2 is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.
Looking to BOOST your career in the exciting field of Big Data, Learn Big Data and Hadoop from Experts.

Hadoop 2 Installation on Ubuntu - Setup of Hadoop CDH5

Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5

2. Hadoop 2 Installation: Video Tutorial

This video tutorial covers Apache Hadoop 2 installation or Cloudera CDH5 installation on Ubuntu. This will help you to learn Hadoop CDH5 installation in an easy manner.

3. Install Hadoop 2 on Ubuntu

Follow the steps given below to install and configure Hadoop 2 cluster on ubuntu os-

3.1. Recommended Platform

  • OS – Linux is supported as a development and production platform. You can use Ubuntu 14.04 or later (you can also use other Linux flavors like CentOS, Redhat, etc.)
  • Hadoop – Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)

I. Setup Platform

If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

3.2. Prerequisites

I. Install Java 8 (Recommended Oracle Java)

a. Install Python Software Properties
[php]sudo apt-get install python-software-properties[/php]
b. Add Repository
[php]sudo add-apt-repository ppa:webupd8team/java[/php]
c. Update the source list
[php]sudo apt-get update[/php]
d. Install Java
[php]sudo apt-get install oracle-java8-installer[/php]

II. Configure SSH

a. Install Open SSH Server-Client
[php]sudo apt-get install openssh-server openssh-client[/php]
b. Generate Key Pairs
[php]ssh-keygen -t rsa -P “”[/php]
c. Configure password-less SSH
[php]cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys[/php]
d. Check by SSH to localhost
[php]ssh localhost[/php]

3.2. Install Hadoop

I. Download Hadoop 2

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz

II. Untar Tar ball

[php]tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz[/php]
Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2).

III. Hadoop 2 Setup Configuration

a. Edit .bashrc
Now, edit .bashrc file located in user’s home directory and add following parameters:

[php]export HADOOP_PREFIX=”/home/hdadmin/hadoop-2.5.0-cdh5.3.2″
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}[/php]

Note: After above step restarts the terminal so that all the environment variables will come into effect.

b. Edit hadoop-env.sh
Now, edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

[php]export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-oracle/)[/php]

c. Edit core-site.xml
Now, edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/dataflair/hdata</value>
</property>
</configuration>[/php]

Note: /home/hdadmin/hdata is a sample location; please specify a location where you have Read Write privileges

d. Edit hdfs-site.xml
Now, edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>[/php]

e. Edit mapred-site.xml
Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>[/php]

f. Edit yarn-site.xml
Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>[/php]

3.4. Start the Cluster

I. Format the name node

[php]bin/hdfs namenode -format[/php]
NOTE: This activity should be done once when you install Hadoop, else It will delete all your data from HDFS.

II. Start HDFS Services

[php]sbin/start-dfs.sh[/php]

III. Start YARN Services

[php]sbin/start-yarn.sh[/php]
Follow this link to learn What is YARN?

IV. Check whether services have been started

[php]jps
NameNode
DataNode
ResourceManager
NodeManager[/php]

3.5. Run Map-Reduce Jobs

I. Run word count example

[php] bin/hdfs dfs -mkdir /inputwords
bin/hdfs dfs -put <data-file> /inputwords
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.2.jar wordcount /inputwords /outputwords
bin/hdfs dfs -cat /outputwords/*[/php]

Follow HDFS command Guide to Play with HDFS Commands and perform various operations,

3.6. Stop The Cluster

I. Stop HDFS Services

[php]sbin/stop-dfs.sh[/php]

II. Stop YARN Services

[php]sbin/stop-yarn.sh[/php]
I hope now you are clear with how you do hadoop 2 installaion.
See Also-

As you install Hadoop 2, you can play with HDFS. For any query about Hadoop 2 Installation feel free to share with us. we will be happy to solve your queries.

Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google

follow dataflair on YouTube

6 Responses

  1. Mindi says:

    I successfully installed Hadoop CDH5 on Ubuntu with your blog. Thanks for the help!!

  2. Ankan says:

    I have tried the same and after starting(start-dfs.sh) i am just able to see 3 services running.
    NameNode
    DataNode
    SecondaryNameNode
    I am not able to see the below services:
    ResourceManager
    NodeManager
    Please help as i am new to hadoop and Linux system too.
    Thanks.

  3. Jay says:

    tar (child): hadoop-2.5.0-cdh5.3.2.tar.gz: Cannot open: No such file or directory
    tar (child): Error is not recoverable: exiting now
    tar: Child returned status 2
    tar: Error is not recoverable: exiting now

  4. Nikita says:

    Great post with lots of efforts.

  5. Nilabhra Patra says:

    Just wanted to know if this method can be used in Ubuntu 17.

Leave a Reply

Your email address will not be published. Required fields are marked *