Install Hadoop 2 on Ubuntu 16.0.4 | Apache Hadoop Installation

Hadoop Quiz

1. Install Hadoop 2 on Ubuntu 16.0.4: Objective

This document describes how to install Hadoop 2 Ubuntu 16.0.4 OS. Single machine Hadoop cluster is also called as Hadoop Pseudo-Distributed Mode. The steps and procedure given in this document to install Hadoop 2 on Ubuntu 16.0.4 and to install Hadoop cluster are very simple and to the point, so that you can install Hadoop very easily on Ubuntu 16.0.4 and within some minutes of time. Once the installation is done you can play with Hadoop and its components like MapReduce for data processing and HDFS for data storage.

Install Hadoop 2 on Ubuntu 16.0.4 | Apache Hadoop Installation

Install Hadoop 2 on Ubuntu 16.0.4 | Apache Hadoop Installation

2. Steps to Install Hadoop 2 on Ubuntu 16.0.4

2.1 Recommended Platform to install Hadoop 2

I. Platform Requirements

  • Operating system: Ubuntu 16.04 or later, other Linux flavors like CentOS, Redhat, etc.
  • Hadoop: Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)

II. Configure & Setup Platform

If you are using Windows/Mac OS you can create virtual machine and install Ubuntu using VMWare Player, alternatively, you can create virtual machine and install Ubuntu using Oracle Virtual Box

2.2. Prerequisites to install Hadoop 2 on Ubuntu

Following are the important steps you need to follow before you install Hadoop 2 on Ubuntu:

I. Install Java 8

a. Install Python Software Properties
To add the java repositories we need to download python-software-properties. To download and install python software properties run below command in terminal:

sudo apt-get install python-software-properties

NOTE: After you press “Enter”. It will ask for your password since we are using “sudo” command to provide root privileges for the installation. For any installation or configuration, we always need root privileges.

b. Add Repository
Now we will add a repository manually from where Ubuntu will install the Java. To add repository type the below command in terminal:

sudo add-apt-repository ppa:webupd8team/java

Now it will ask you to Press [Enter] to continue. Press “Enter”.

c. Update the source list
It is recommended to update the source list periodically. If you want to update, install a new package, always update the source list. The source list is a location from where Ubuntu can download and install the software. To update source list type the below command in terminal:

sudo apt-get update

When you run the above command Ubuntu updates its source list.

d. Install Java
Now we will download and install the Java. To download and install Java type the below command in terminal:

sudo apt-get install oracle-java8-installer

When you will press enter it will start downloading and installing Java.

To confirm Java installation has successfully completed or not and to check the version of your Java type the below command in terminal:

java –version

II. Configure SSH

SSH means secured shell which is used for the remote login. We can login to a remote machine using SSH. Now we need to configure password less SSH. Password-less SSH means without a password we can login to a remote machine. Password-less SSH setup is required for remote script invocation. Automatically remotely master will start the demons on slaves.

a. Install Open SSH Server-Client
These are the SSH tools.

sudo apt-get install openssh-server openssh-client

b. Generate Key Pairs
ssh-keygen -t rsa -P ""

It will ask “Enter the name of file in which to save the key (/home/dataflair/.ssh/id_rsa):” let it be the default, don’t specify any path just press “Enter”. Now it will be available in the default path i.e. “.ssh”. To check the default path use command “$ls .ssh/” and you will see that two files are created “id_rsa” which is a private key and “id_rsa.pub” which is a public key.

c. Configure passwordless SSH
We will copy the contents of “id_rsa.pub” into the “authorized_keys” file by using below command:

cat $HOME/.ssh/id_rsa.pub>>$HOME/.ssh/authorized_keys

d. Check by SSH to localhost
ssh localhost

It will not ask for any password and you can easily get logged into localhost since we have configured passwordless SSH.

2.3. Install Hadoop 2 on Ubuntu 16.0.4

I. Download Hadoop

Download Hadoop from the below link:
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz

After downloading Hadoop just copy it to your desktop and from desktop move it to your home directory by using the following command:

mv Desktop/hadoop-2.5.0-cdh5.3.2.tar.gz /home/dataflair/

Note: /home/dataflair/ is my home directory path.
To know the path of your home directory use command: $pwd
Copy this path to the above command and hence the setup file will get moved to your home directory.

II. Untar Tarball

tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz
Note: All the necessary files like jars, scripts, configuration files, and so on are already available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2).

III. Setup Configuration to Install Hadoop 2 on Ubuntu

a. Edit configuration .bashrc file
Edit “.bashrc” file which is present in your home directory. You can identify and edit this file in your home directory by using the following command: $ nano -/.bashrc”. Now write the below text at the end of this file:

export HADOOP_PREFIX="/home/dataflair/hadoop-2.5.0-cdh5.3.2"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

Note: Make sure that you enter the correct path. “/home/dataflair/hadoop-2.5.0-cdh5.3.2” this is my home directory path. To know the path of your home directory use command: $pwd

After adding the above parameters we need to save this file. To save this file Press “Ctrl+X”.
Note: After above step restart the terminal, in order to make all the environment variables will start running

b. Edit hadoop-env.sh file

Edit configuration file “hadoop-env.sh” located in configuration directory (HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

dataflair@ubuntu:~$cd hadoop-2.5.0-cdh5.3.2/
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2$ cd etc/hadoop
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano hadoop-env.sh

In this file set JAVA_HOME as:

export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-oracle/)
After adding the above parameters we need to save this file. To save this file press “Ctrl+X”.
Note: “/usr/lib/jvm/java-8-oracle/” is default Java path. If you had changed your java path then enter your java path here.

c. Edit core-site.xml file
Edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) by using the following command:

dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano core-site.xml

And add below entries between <configuration> </configuration> at the end of this file:

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/dataflair/hdata</value>
</property>

Note: “/home/dataflair/hdata” is my location; please insert the location where you have Read and Write privileges.
After adding the above parameters we need to save this file. To save this file press “Ctrl+X”.

d. Edit hdfs-site.xml file
Edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) by using the following command:

dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano hdfs-site.xml

And add below entries between <configuration> </configuration> at the end of this file:

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

After adding the above parameters we need to save this file. To save this file press “Ctrl+X”.

e. Edit mapred-site.xml file
Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) by using the following command:
Note: There is no such file present in your home directory as mapred-site.xml. There is a template file available as mapred-site.xml.template. So to edit file mapred-site.xml you have to first create a copy of file mapred-site.xml.template.
To make a copy of this file use the following command:

dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml

Now edit the file mapred-site.xml by using the following command:

dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano mapred-site.xml

Now add below entries between at the end of this file:

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

f. Edit yarn-site.xml file
Edit configuration file yarn-site.xml (located in HADOOP_HOME/etc/hadoop) by using the following command:
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano yarn-site.xml
And add below entries between at the end of this file:

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

IV. Start the Cluster

a. Format the name node
dataflair@ubuntu:~$ hdfs namenode -format
NOTE: Do this activity only once when you had successfully installed Hadoop, else it will delete all your data from HDFS.

b. Start HDFS Services
dataflair@ubuntu:~$ start-dfs.sh

c. Start YARN Services
dataflair@ubuntu:~$ start-yarn.sh

d. Check running Hadoop services

dataflair@ubuntu:~$ jps
NameNode
DataNode
ResourceManager
NodeManager

See Also-

Hope the tutorial on Install Hadoop 2 on Ubuntu was helpful. For any difficulties while you install Hadoop 2 on Ubuntu just drop a comment and our support team will help you out.

No Responses

  1. Kiersten says:

    I have been trying to install Hadoop for more than three days, But again and again getting new errors. I started from scratch many times but no luck. Thanks so much for this tutorial It is lovely worth sufficient for me to install Hadoop successfuly.

  2. Alfonzo says:

    You made some really good points there. I looked on the web for more info about the
    issue and found most people will go along with your views
    on this site.

  3. Guru says:

    Hi,
    Thanks for the article.
    Can we expect an article to setup multi-node cluster for Hadoop in amazon cloud and access it using windows
    Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *