How to Install Hadoop 3 on Ubuntu – A Step-by step Installation Process

With this tutorial, we will learn the complete process to install Hadoop 3 on ubuntu. The process involves some easy-to-follow steps including commands and instructions. Each step is attached with screen images which will guide you throughout the process of Hadoop installation.

Let’s begin the process.

Don't become Obsolete & get a Pink Slip
Follow DataFlair on Google News & Stay ahead of the game

Steps to Install Hadoop 3 on Ubuntu

Prerequisites

First, download the Hadoop 3.1.2 from the below link:

Hadoop 3.1.2

Here are the steps for installing Hadoop 3 on ubuntu for your system:

Step 1: Install ssh on your system using the below command:

sudo apt-get install ssh

Command to install ssh - Install hadoop 3.1.2

Type the password for the sudo user and then press Enter.

ssh installation

Type ‘Y’ and then press Enter to continue with the installation process.

520+ FREE Hadoop Tutorials to become a Hadoop Expert

Step 2: Install pdsh on your system using the below command:

sudo apt-get install pdsh
Command to install pdsh - Hadoop 3.1.2 Installation

Install pdsh command

pdsh installation

pdsh installation

Type ‘Y’ and then press Enter to continue with the installation process.

Step 3: Open the .bashrc file in the nano editor using the following command:

nano .bashrc

bashrc file in nano editor - hadoop 3.2.1 installation

Now set the PDSH_RCMD_TYPE environment variable to ssh

export PDSH_RCMD_TYPE=ssh

set PDSH_RCMD_TYPE to ssh - Download Hadoop 3.1.2

To save the changes you’ve made, press Ctrl+O. To exit the nano editor, press Ctrl+X and then press ‘Y’ to exit the editor.

Step 4: Now configure ssh. To do so, create a new key with the help of the following command:

ssh-keygen -t rsa -P ""

Press Enter when asked the file name.

create new ssh key - hadoop 3.1.2 installation steps

Step 5: Copy the content of the public key to authorized_keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh access to host machine - hadoop ubuntu -

Step 6: Now examine the SSH setup by connecting to the localhost.

ssh localhost
check connection of ssh - install hadoop on linux

connection check of ssh

Type ‘Y’ and then press Enter to continue with the connection.

ssh connection checked - how to install hadoop 3.1.2 in ubuntu

Step 7: Update the source lists.

sudo apt-get update
command to update packages - Install Hadoop 3.1.2

command to update packages

updating packages - hadoop installation process

updating packages

Step 8: Now install Java 8 using the following command:

sudo apt-get install openjdk-8-jdk
java installation loading

java installation

Type ‘Y’ and then press Enter to finish with the installation process.

java installation completion

java installation completion

Step 9: To cross-check whether you have successfully installed Java on your machine or not, run the below command:

java -version

checking java version - install hadoop 3.1.2

Step 10: Now locate the Hadoop tar file in your system.

locate hadoop tar file

Step 11: Extract the hadoop-3.1.2.tar.gz file using the below command:

tar xzf hadoop-3.1.2.tar.gz
extract hadoop files - Install Hadoop 3.1.2

extract hadoop files

Hadoop files extraction complete

Step 12: Rename hadoop-3.1.2.tar.gz as hadoop for ease of use.

mv hadoop-3.1.2.tar.gz hadoop

rename hadoop 3.1.2.tar.gz - Install Hadoop 3.1.2

Any doubts in the process to install Hadoop 3.1.2 till now? Share them in the comment section.

Step 14: Now check the Java home path which we will set up in Hadoop env.sh

ls /usr/lib/jvm/java-8-openjdk-amd64/

list of files in java - Hadoop 3.1.2 Installation Steps

Step 15: Open the hadoop-env.sh file in the nano editor. This file is located in ~/hadoop/etc/hadoop configuration directory.

nano hadoop-env

Set JAVA_HOME:

export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-openjdk-amd64/)

hadoop env_file - Install Hadoop 3.1.2

To save the changes you’ve made, press Ctrl+O. To exit the nano editor, press Ctrl+X and then press ‘Y’ to exit the editor.

Step 16: Open the core-site.xml file in the nano editor. This file is also located in the ~/hadoop/etc/hadoop (Hadoop configuration directory).

nano core-site.xml

Add the following configuration properties:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/dataflair/hdata</value>
    </property>
</configuration>

Note: /home/dataflair/hdata is a sample location; please specify a location where you have Read Write privilegesconfigure core site_xml

Step 17: Open the hdfs-site.xml file in the nano editor. This file is also located in ~/hadoop/etc/hadoop (Hadoop configuration directory):

nano hdfs-site.xml

Add the following entries in core-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

configure hdfs site_xml - Install Hadoop 3.1.2

Step 18: Open the mapred-site.xml file in the nano editor. This file is also located in ~/hadoop/etc/hadoop (Hadoop configuration directory).

nano mapred-site.xml

Add the following entries in core-site.html:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
<property>
 <name>yarn.app.mapreduce.am.env</name>
 <value>HADOOP_MAPRED_HOME=/home/dataflair/hadoop</value>
</property>
<property>
 <name>mapreduce.map.env</name>
 <value>HADOOP_MAPRED_HOME=/home/dataflair/hadoop</value>
</property>
<property>
 <name>mapreduce.reduce.env</name>
 <value>HADOOP_MAPRED_HOME=/home/dataflair/hadoop</value>
</property>
</configuration>

configure mapred site_xml

Step 19: Open the yarn-site.xml file in the nano editor. This file is also located in ~/hadoop/etc/hadoop (Hadoop configuration directory).

nano yarn-site.xml

Add the following entries in the core-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property> 
</configuration>

configure yarn site_xml - Hadoop Installation

Step 20: Open the bashrc files in the nano editor using the following command:

nano .bashrc

bashrc file in nano editor 2 - Hadoop 3.1.2 Installation

Edit .bashrc file located in the user’s home directory and add the following parameters:

export HADOOP_HOME="/home/dataflair/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin 
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}

set hadoop home variable

To save the changes you’ve made, press Ctrl+O. To exit the nano editor, press Ctrl+X and then press ‘Y’ to exit the editor.

Step 21: Before starting Hadoop, we need to format HDFS, which can be done using the below command:

bin/hdfs namenode -format
command for namenode format - install hadoop 3.2.1

namenode format command

namenode format - Hadoop Installation

namenode format

Step 22: Start the HDFS services:

sbin/start-dfs.sh
command to start hadoop cluster

command to start hadoop cluster

hadoop cluster started

hadoop cluster started

Step 23: Open the HDFS web console:

localhost:9870

verification of hadoop cluster

Step 24: Now start the yarn services:

sbin/start-yarn.sh
command to start yarn - Install Hadoop 3.1.2

command to start yarn

yarn started - Hadoop 3.1.2 Installation

yarn started

$jps
NameNode
DataNode
ResourceManager
NodeManager
SecondaryNameNode

The ‘jps’ command is used to check whether all the Hadoop processes are running or not.

Step 25: Open the yarn web console:

localhost:8088

yarn verification - Hadoop 3.1.2 Installation

This is it. You have completed the process to install Hadoop 3 on ubuntu. I hope you found the article useful.

Next, you must complete the Hadoop HDFS Commands Tutorial

Feel free to share any of your queries in the comment section.

8 Responses

  1. Sarvottam Patel says:

    Hey the web interface for namenode is not working.

  2. haranesh says:

    NameNode is not working still its very very helpfull artical for fresher who want to work with hadoop 3.x

    • Ted Cahall says:

      Take the reference to hadoop.tmp.dir as /home/dataflair/hdata out of core-site.xml. Mine was working before that addition and stopped – even though I had it in HADOOP_HOME and the hdata directory was there with the correct permissions. As soon as I removed that reference, the NameNode begain working again. Overall it was a good, well detailed article on Installation of Hadoop 3.x on Ubuntu on single node cluster.

    • Ted says:

      One of the reasons I could not get the NameNode to start was that after I added the
      hadoop.tmp.dir
      /home//hdata
      to my core-site.xml file, I forgot to format the HDFS file system. Once I did that, it worked.
      It is better to use a ‘hdata’ or some data directory under your HADOOP_HOME since by default the HDFS directory will be built under /tmp and be removed during system reboot.

  3. Sachin says:

    Error while formatting the namenode
    Cannot create directory /home/dataflair/hdata/dfs/name/current
    Please help with detailed settings in any of the files while configuration
    Thanks
    -Sachin

  4. sunil kumar says:

    I am getting error when running the command hdfs namenode -format as below:
    Error: Invalid HADOOP_YARN_HOME

    I am running Hadoop 3.2.1 on Ubuntu 20.1

    Read somewhere that env variable needn’t be declared in version 3.0
    Tried that as well but still same error

  5. TobyBC says:

    Pretty good but this tutorial needs some work to be user-friendly for people like me who knew nothing about hadoop and rusty on Unix commands.

    Here’s what I needed to change to the steps:

    Step 17: “Add the following entries in core-site.xml”

    should read: “Add the following entries in hdfs-site.xml”

    Also need to create teh hdata directory with:
    mkdir hdata

    These properties need to be added to the xml

    dfs.namenode.name.dir
    file:/home/dataflair/hdata/dfs/namenode

    dfs.datanode.data.dir
    file:/home/dataflair/hdata/dfs/datanode

    Step 18: “Add the following entries in core-site.html:”

    should read: “Add the following entries in mapred-site.html:”

    Step 19: “Add the following entries in core-site.html:”

    should read: “Add the following entries in yarn-site.html:”

    Step 20: The list isn’t complete, this is what I used:

    #version 3.2.1 requires pdsh package and then this allows hadoop to run start-dfs.sh properly
    export PDSH_RCMD_TYPE=ssh
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

    export HADOOP_HOME=/home/jeff/hadoop
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME

    #used by log4j you will see a warning at the top of the output when running hdfs namenode -format
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

    After changing .bashrc run the line below to update the settings:
    source ~/.bashrc

    Step 21: This will run after running making the change to step 20
    hdfs namenode -format

  6. Cliff Chen says:

    the datanodes didn’t work when we used master to connect multi slaves
    Do you have any idea how to solve ? thank you
    P.S. hadoop 3.1.4 and ubuntu 18.0.4 are used

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.