Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5

Boost your career with Data Engineering Courses!!

1. Hadoop 2 Installation Tutorial: Objective

This Hadoop 2 Installation tutorial describes how to install and configure Hadoop cluster on a single-node on Ubuntu OS. Single Node Hadoop cluster is also called as “Hadoop Pseudo-Distributed Mode”. The Hadoop 2 installation is explained here very simply and to the point, so that you can learn Hadoop CDH5 Installation in 10 Min. Once the you install Hadoop 2 is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.
Looking to BOOST your career in the exciting field of Big Data, Learn Big Data and Hadoop from Experts.

2. Hadoop 2 Installation: Video Tutorial

This video tutorial covers Apache Hadoop 2 installation or Cloudera CDH5 installation on Ubuntu. This will help you to learn Hadoop CDH5 installation in an easy manner.

3. Install Hadoop 2 on Ubuntu

Follow the steps given below to install and configure Hadoop 2 cluster on ubuntu os-

3.1. Recommended Platform

OS – Linux is supported as a development and production platform. You can use Ubuntu 14.04 or later (you can also use other Linux flavors like CentOS, Redhat, etc.)
Hadoop – Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)

I. Setup Platform

If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

3.2. Prerequisites

I. Install Java 8 (Recommended Oracle Java)

a. Install Python Software Properties
[php]sudo apt-get install python-software-properties[/php]
b. Add Repository
[php]sudo add-apt-repository ppa:webupd8team/java[/php]
c. Update the source list
[php]sudo apt-get update[/php]
d. Install Java
[php]sudo apt-get install oracle-java8-installer[/php]

II. Configure SSH

a. Install Open SSH Server-Client
[php]sudo apt-get install openssh-server openssh-client[/php]
b. Generate Key Pairs
[php]ssh-keygen -t rsa -P “”[/php]
c. Configure password-less SSH
[php]cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys[/php]
d. Check by SSH to localhost
[php]ssh localhost[/php]

3.2. Install Hadoop

I. Download Hadoop 2

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz

II. Untar Tar ball

[php]tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz[/php]
Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2).

III. Hadoop 2 Setup Configuration

a. Edit .bashrc
Now, edit .bashrc file located in user’s home directory and add following parameters:

[php]export HADOOP_PREFIX=”/home/hdadmin/hadoop-2.5.0-cdh5.3.2″
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}[/php]

Note: After above step restarts the terminal so that all the environment variables will come into effect.

b. Edit hadoop-env.sh
Now, edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

[php]export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-oracle/)[/php]

c. Edit core-site.xml
Now, edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/dataflair/hdata</value>
</property>
</configuration>[/php]

Note: /home/hdadmin/hdata is a sample location; please specify a location where you have Read Write privileges

d. Edit hdfs-site.xml
Now, edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>[/php]

e. Edit mapred-site.xml
Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>[/php]

f. Edit yarn-site.xml
Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>[/php]

3.4. Start the Cluster

I. Format the name node

[php]bin/hdfs namenode -format[/php]
NOTE: This activity should be done once when you install Hadoop, else It will delete all your data from HDFS.

II. Start HDFS Services

[php]sbin/start-dfs.sh[/php]

III. Start YARN Services

[php]sbin/start-yarn.sh[/php]
Follow this link to learn What is YARN?

IV. Check whether services have been started

[php]jps
NameNode
DataNode
ResourceManager
NodeManager[/php]

3.5. Run Map-Reduce Jobs

I. Run word count example

[php] bin/hdfs dfs -mkdir /inputwords
bin/hdfs dfs -put <data-file> /inputwords
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.2.jar wordcount /inputwords /outputwords
bin/hdfs dfs -cat /outputwords/*[/php]

Follow HDFS command Guide to Play with HDFS Commands and perform various operations,

3.6. Stop The Cluster

I. Stop HDFS Services

[php]sbin/stop-dfs.sh[/php]

II. Stop YARN Services

[php]sbin/stop-yarn.sh[/php]
I hope now you are clear with how you do hadoop 2 installaion.
See Also-

As you install Hadoop 2, you can play with HDFS. For any query about Hadoop 2 Installation feel free to share with us. we will be happy to solve your queries.

If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google

Mindi says:
December 11, 2016 at 5:56 pm
I successfully installed Hadoop CDH5 on Ubuntu with your blog. Thanks for the help!!
- Sandeep Mahadasa says:
  July 3, 2018 at 1:42 pm
  Can you help me
Ankan says:
February 17, 2017 at 6:50 pm
I have tried the same and after starting(start-dfs.sh) i am just able to see 3 services running.
NameNode
DataNode
SecondaryNameNode
I am not able to see the below services:
ResourceManager
NodeManager
Please help as i am new to hadoop and Linux system too.
Thanks.
Jay says:
February 21, 2017 at 4:44 pm
tar (child): hadoop-2.5.0-cdh5.3.2.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Nikita says:
May 3, 2017 at 6:33 am
Great post with lots of efforts.
Nilabhra Patra says:
May 10, 2017 at 12:01 pm
Just wanted to know if this method can be used in Ubuntu 17.

Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5

1. Hadoop 2 Installation Tutorial: Objective

2. Hadoop 2 Installation: Video Tutorial

3. Install Hadoop 2 on Ubuntu

3.1. Recommended Platform

I. Setup Platform

3.2. Prerequisites

I. Install Java 8 (Recommended Oracle Java)

II. Configure SSH

3.2. Install Hadoop

I. Download Hadoop 2

II. Untar Tar ball

III. Hadoop 2 Setup Configuration

3.4. Start the Cluster

I. Format the name node

II. Start HDFS Services

III. Start YARN Services

IV. Check whether services have been started

3.5. Run Map-Reduce Jobs

I. Run word count example

3.6. Stop The Cluster

I. Stop HDFS Services

II. Stop YARN Services

6 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials