How to Install Hadoop 3 on Ubuntu – A Step-by step Installation Process

With this tutorial, we will learn the complete process to install Hadoop 3 on ubuntu. The process involves some easy-to-follow steps including commands and instructions. Each step is attached with screen images which will guide you throughout the process of Hadoop installation.

Let’s begin the process.

Steps to Install Hadoop 3 on Ubuntu

Prerequisites

First, download the Hadoop 3.1.2 from the below link:

Hadoop 3.1.2

Here are the steps for installing Hadoop 3 on ubuntu for your system:

Step 1: Install ssh on your system using the below command:

sudo apt-get install ssh

Command to install ssh - Install hadoop 3.1.2

Type the password for the sudo user and then press Enter.

ssh installation

Type ‘Y’ and then press Enter to continue with the installation process.

520+ FREE Hadoop Tutorials to become a Hadoop Expert

Step 2: Install pdsh on your system using the below command:

sudo apt-get install pdsh
Command to install pdsh - Hadoop 3.1.2 Installation

Install pdsh command

pdsh installation

pdsh installation

Type ‘Y’ and then press Enter to continue with the installation process.

Step 3: Open the .bashrc file in the nano editor using the following command:

nano .bashrc

bashrc file in nano editor - hadoop 3.2.1 installation

Now set the PDSH_RCMD_TYPE environment variable to ssh

export PDSH_RCMD_TYPE=ssh

set PDSH_RCMD_TYPE to ssh - Download Hadoop 3.1.2

To save the changes you’ve made, press Ctrl+O. To exit the nano editor, press Ctrl+X and then press ‘Y’ to exit the editor.

Step 4: Now configure ssh. To do so, create a new key with the help of the following command:

ssh-keygen -t rsa -P ""

Press Enter when asked the file name.

create new ssh key - hadoop 3.1.2 installation steps

Step 5: Copy the content of the public key to authorized_keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh access to host machine - hadoop ubuntu -

Step 6: Now examine the SSH setup by connecting to the localhost.

ssh localhost
check connection of ssh - install hadoop on linux

connection check of ssh

Type ‘Y’ and then press Enter to continue with the connection.

ssh connection checked - how to install hadoop 3.1.2 in ubuntu

Step 7: Update the source lists.

sudo apt-get update
command to update packages - Install Hadoop 3.1.2

command to update packages

updating packages - hadoop installation process

updating packages

Step 8: Now install Java 8 using the following command:

sudo apt-get install openjdk-8-jdk
java installation loading

java installation

Type ‘Y’ and then press Enter to finish with the installation process.

java installation completion

java installation completion

Step 9: To cross-check whether you have successfully installed Java on your machine or not, run the below command:

java -version

checking java version - install hadoop 3.1.2

Step 10: Now locate the Hadoop tar file in your system.

locate hadoop tar file

Step 11: Extract the hadoop-3.1.2.tar.gz file using the below command:

tar xzf hadoop-3.1.2.tar.gz
extract hadoop files - Install Hadoop 3.1.2

extract hadoop files

Hadoop files extraction complete

Step 12: Rename hadoop-3.1.2.tar.gz as hadoop for ease of use.

mv hadoop-3.1.2.tar.gz hadoop

rename hadoop 3.1.2.tar.gz - Install Hadoop 3.1.2

Any doubts in the process to install Hadoop 3.1.2 till now? Share them in the comment section.

Step 14: Now check the Java home path which we will set up in Hadoop env.sh

ls /usr/lib/jvm/java-8-openjdk-amd64/

list of files in java - Hadoop 3.1.2 Installation Steps

Step 15: Open the hadoop-env.sh file in the nano editor. This file is located in ~/hadoop/etc/hadoop configuration directory.

nano hadoop-env

Set JAVA_HOME:

export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-openjdk-amd64/)

hadoop env_file - Install Hadoop 3.1.2

To save the changes you’ve made, press Ctrl+O. To exit the nano editor, press Ctrl+X and then press ‘Y’ to exit the editor.

Step 16: Open the core-site.xml file in the nano editor. This file is also located in the ~/hadoop/etc/hadoop (Hadoop configuration directory).

nano core-site.xml

Add the following configuration properties:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/dataflair/hdata</value>
    </property>
</configuration>

Note: /home/dataflair/hdata is a sample location; please specify a location where you have Read Write privilegesconfigure core site_xml

Step 17: Open the hdfs-site.xml file in the nano editor. This file is also located in ~/hadoop/etc/hadoop (Hadoop configuration directory):

nano hdfs-site.xml

Add the following entries in core-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

configure hdfs site_xml - Install Hadoop 3.1.2

Step 18: Open the mapred-site.xml file in the nano editor. This file is also located in ~/hadoop/etc/hadoop (Hadoop configuration directory).

nano mapred-site.xml

Add the following entries in core-site.html:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
<property>
 <name>yarn.app.mapreduce.am.env</name>
 <value>HADOOP_MAPRED_HOME=/home/dataflair/hadoop</value>
</property>
<property>
 <name>mapreduce.map.env</name>
 <value>HADOOP_MAPRED_HOME=/home/dataflair/hadoop</value>
</property>
<property>
 <name>mapreduce.reduce.env</name>
 <value>HADOOP_MAPRED_HOME=/home/dataflair/hadoop</value>
</property>
</configuration>

configure mapred site_xml

Step 19: Open the yarn-site.xml file in the nano editor. This file is also located in ~/hadoop/etc/hadoop (Hadoop configuration directory).

nano yarn-site.xml

Add the following entries in the core-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property> 
</configuration>

configure yarn site_xml - Hadoop Installation

Step 20: Open the bashrc files in the nano editor using the following command:

nano .bashrc

bashrc file in nano editor 2 - Hadoop 3.1.2 Installation

Edit .bashrc file located in the user’s home directory and add the following parameters:

export HADOOP_HOME="/home/dataflair/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin 
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}

set hadoop home variable

To save the changes you’ve made, press Ctrl+O. To exit the nano editor, press Ctrl+X and then press ‘Y’ to exit the editor.

Step 21: Before starting Hadoop, we need to format HDFS, which can be done using the below command:

bin/hdfs namenode -format
command for namenode format - install hadoop 3.2.1

namenode format command

namenode format - Hadoop Installation

namenode format

Step 22: Start the HDFS services:

sbin/start-dfs.sh
command to start hadoop cluster

command to start hadoop cluster

hadoop cluster started

hadoop cluster started

Step 23: Open the HDFS web console:

localhost:9870

verification of hadoop cluster

Step 24: Now start the yarn services:

sbin/start-yarn.sh
command to start yarn - Install Hadoop 3.1.2

command to start yarn

yarn started - Hadoop 3.1.2 Installation

yarn started

$jps
NameNode
DataNode
ResourceManager
NodeManager
SecondaryNameNode

The ‘jps’ command is used to check whether all the Hadoop processes are running or not.

Step 25: Open the yarn web console:

localhost:8088

yarn verification - Hadoop 3.1.2 Installation

This is it. You have completed the process to install Hadoop 3 on ubuntu. I hope you found the article useful.

Next, you must complete the Hadoop HDFS Commands Tutorial

Feel free to share any of your queries in the comment section.

No Responses

  1. Sarvottam Patel says:

    Hey the web interface for namenode is not working.

  2. haranesh says:

    NameNode is not working still its very very helpfull artical for fresher who want to work with hadoop 3.x

    • Ted Cahall says:

      Take the reference to hadoop.tmp.dir as /home/dataflair/hdata out of core-site.xml. Mine was working before that addition and stopped – even though I had it in HADOOP_HOME and the hdata directory was there with the correct permissions. As soon as I removed that reference, the NameNode begain working again. Overall it was a good, well detailed article on Installation of Hadoop 3.x on Ubuntu on single node cluster.

    • Ted says:

      One of the reasons I could not get the NameNode to start was that after I added the
      hadoop.tmp.dir
      /home//hdata
      to my core-site.xml file, I forgot to format the HDFS file system. Once I did that, it worked.
      It is better to use a ‘hdata’ or some data directory under your HADOOP_HOME since by default the HDFS directory will be built under /tmp and be removed during system reboot.

  3. Sachin says:

    Error while formatting the namenode
    Cannot create directory /home/dataflair/hdata/dfs/name/current
    Please help with detailed settings in any of the files while configuration
    Thanks
    -Sachin

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.