Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5

Boost your career with Free Big Data Courses!!

1. Hadoop 2 Installation Tutorial: Objective

This Hadoop 2 Installation tutorial describes how to install and configure Hadoop cluster on a single-node on Ubuntu OS. Single Node Hadoop cluster is also called as “Hadoop Pseudo-Distributed Mode”. The Hadoop 2 installation is explained here very simply and to the point, so that you can learn Hadoop CDH5 Installation in 10 Min. Once the you install Hadoop 2 is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.
Looking to BOOST your career in the exciting field of Big Data, Learn Big Data and Hadoop from Experts.

2. Hadoop 2 Installation: Video Tutorial

This video tutorial covers Apache Hadoop 2 installation or Cloudera CDH5 installation on Ubuntu. This will help you to learn Hadoop CDH5 installation in an easy manner.

3. Install Hadoop 2 on Ubuntu

Follow the steps given below to install and configure Hadoop 2 cluster on ubuntu os-

3.1. Recommended Platform

OS – Linux is supported as a development and production platform. You can use Ubuntu 14.04 or later (you can also use other Linux flavors like CentOS, Redhat, etc.)
Hadoop – Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)

I. Setup Platform

If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

3.2. Prerequisites

I. Install Java 8 (Recommended Oracle Java)

a. Install Python Software Properties
[php]sudo apt-get install python-software-properties[/php]
b. Add Repository
[php]sudo add-apt-repository ppa:webupd8team/java[/php]
c. Update the source list
[php]sudo apt-get update[/php]
d. Install Java
[php]sudo apt-get install oracle-java8-installer[/php]

II. Configure SSH

a. Install Open SSH Server-Client
[php]sudo apt-get install openssh-server openssh-client[/php]
b. Generate Key Pairs
[php]ssh-keygen -t rsa -P “”[/php]
c. Configure password-less SSH
[php]cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys[/php]
d. Check by SSH to localhost
[php]ssh localhost[/php]

3.2. Install Hadoop

I. Download Hadoop 2

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz

II. Untar Tar ball

[php]tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz[/php]
Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2).

III. Hadoop 2 Setup Configuration

a. Edit .bashrc
Now, edit .bashrc file located in user’s home directory and add following parameters:

[php]export HADOOP_PREFIX=”/home/hdadmin/hadoop-2.5.0-cdh5.3.2″
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}[/php]

Note: After above step restarts the terminal so that all the environment variables will come into effect.

b. Edit hadoop-env.sh
Now, edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

[php]export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-oracle/)[/php]

c. Edit core-site.xml
Now, edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/dataflair/hdata</value>
</property>
</configuration>[/php]

Note: /home/hdadmin/hdata is a sample location; please specify a location where you have Read Write privileges

d. Edit hdfs-site.xml
Now, edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>[/php]

e. Edit mapred-site.xml
Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>[/php]

f. Edit yarn-site.xml
Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

[php]<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>[/php]

3.4. Start the Cluster

I. Format the name node

[php]bin/hdfs namenode -format[/php]
NOTE: This activity should be done once when you install Hadoop, else It will delete all your data from HDFS.

II. Start HDFS Services

[php]sbin/start-dfs.sh[/php]

III. Start YARN Services

[php]sbin/start-yarn.sh[/php]
Follow this link to learn What is YARN?

IV. Check whether services have been started

[php]jps
NameNode
DataNode
ResourceManager
NodeManager[/php]

3.5. Run Map-Reduce Jobs

I. Run word count example

[php] bin/hdfs dfs -mkdir /inputwords
bin/hdfs dfs -put <data-file> /inputwords
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.2.jar wordcount /inputwords /outputwords
bin/hdfs dfs -cat /outputwords/*[/php]

Follow HDFS command Guide to Play with HDFS Commands and perform various operations,

3.6. Stop The Cluster

I. Stop HDFS Services

[php]sbin/stop-dfs.sh[/php]

II. Stop YARN Services

[php]sbin/stop-yarn.sh[/php]
I hope now you are clear with how you do hadoop 2 installaion.
See Also-

As you install Hadoop 2, you can play with HDFS. For any query about Hadoop 2 Installation feel free to share with us. we will be happy to solve your queries.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google

Mindi says:
December 11, 2016 at 5:56 pm
I successfully installed Hadoop CDH5 on Ubuntu with your blog. Thanks for the help!!
- Sandeep Mahadasa says:
  July 3, 2018 at 1:42 pm
  Can you help me
Ankan says:
February 17, 2017 at 6:50 pm
I have tried the same and after starting(start-dfs.sh) i am just able to see 3 services running.
NameNode
DataNode
SecondaryNameNode
I am not able to see the below services:
ResourceManager
NodeManager
Please help as i am new to hadoop and Linux system too.
Thanks.
Jay says:
February 21, 2017 at 4:44 pm
tar (child): hadoop-2.5.0-cdh5.3.2.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Nikita says:
May 3, 2017 at 6:33 am
Great post with lots of efforts.
Nilabhra Patra says:
May 10, 2017 at 12:01 pm
Just wanted to know if this method can be used in Ubuntu 17.

Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5

1. Hadoop 2 Installation Tutorial: Objective

2. Hadoop 2 Installation: Video Tutorial

3. Install Hadoop 2 on Ubuntu

3.1. Recommended Platform

I. Setup Platform

3.2. Prerequisites

I. Install Java 8 (Recommended Oracle Java)

II. Configure SSH

3.2. Install Hadoop

I. Download Hadoop 2

II. Untar Tar ball

III. Hadoop 2 Setup Configuration

3.4. Start the Cluster

I. Format the name node

II. Start HDFS Services

III. Start YARN Services

IV. Check whether services have been started

3.5. Run Map-Reduce Jobs

I. Run word count example

3.6. Stop The Cluster

I. Stop HDFS Services

II. Stop YARN Services

6 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials