Installation of Hadoop 3.x on Ubuntu on Single Node Cluster 1


1. Objective

In this tutorial on Installation of Hadoop 3.x on Ubuntu, we are going to learn steps for setting up a pseudo-distributed, single-node Hadoop 3.x cluster on Ubuntu. We will learn steps like how to install java, how to install SSH and configure passwordless SSH, how to download Hadoop, how to setup Hadoop configurations like .bashrc file, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, YARN-site.xml, how to start the Hadoop cluster and how to stop the Hadoop services.

Learn installation of Hadoop 2.7.x from this installation guide.

installation of hadoop 3.x on ubuntu

2. Installation of Hadoop 3.x on Ubuntu

Before we start with Hadoop 3.x installation on Ubuntu, let us understand key features that have been added in Hadoop 3 that makes the comparison between Hadoop 2 and Hadoop 3.

2.1. Java 8 installation

Hadoop requires working java installation. Let us start with steps for installing java 8:

a. Install Python Software Properties

sudo apt-get install python-software-properties

b. Add Repository

sudo add-apt-repository ppa:webupd8team/java

c. Update the source list

sudo apt-get update

d. Install Java 8

sudo apt-get install oracle-java8-installer

e. Check if java is correctly installed

java -version

2.2. Configure SSH

SSH is used for remote login. SSH is required in Hadoop to manage its nodes, i.e. remote machines and local machine if you want to use Hadoop on it. Let us now see SSH installation of Hadoop 3.x on Ubuntu:

a. Installation of passwordless SSH

sudo apt-get install ssh
sudo apt-get install pdsh

b. Generate Key Pairs

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

c. Configure passwordless ssh

cat ~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys

e. Change the permission of file that contains the key

chmod 0600 ~/.ssh/authorized_keys

f. check ssh to the localhost

ssh localhost

2.3. Install Hadoop

a. Download Hadoop

http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-3.0.0-alpha2/hadoop-3.0.0-alpha2.tar.gz

(Download the latest version of Hadoop hadoop-3.0.0-alpha2.tar.gz)

b. Untar Tarball

tar -xzf hadoop-3.0.0-alpha2.tar.gz

2.4. Hadoop Setup Configuration

a. Edit .Bashrc

Open .bashrc

nano ~/.bashrc

Edit .bashrc:

Edit .bashrc file is located in user’s home directory and adds following parameters:

export HADOOP_PREFIX="/home/dataflair/hadoop-3.0.0-alpha2"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

Then run

Source ~/.bashrc

b. Edit hadoop-env.sh

Edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

export JAVA_HOME=/usr/lib/jvm/java-8-oracle/

c. Edit core-site.xml
Edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

<configuration> 
<property> 
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property> 
<property> 
<name>hadoop.tmp.dir</name> 
<value>/home/dataflair/hdata</value>
</property> 
</configuration>

d. Edit hdfs-site.xml

Edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

e. Edit mapred-site.xml

If mapred-site.xml file is not available, then use

cp mapred-site.xml.template mapred-site.xml

Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

f. Yarn-site.xml

Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Test your Hadoop knowledge with this Big data Hadoop quiz.

2.5. How to Start the Hadoop services

Let us now see how to start the Hadoop cluster:

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster”. This is done as follows:

a. Format the namenode

bin/hdfs namenode -format

NOTE: This activity should be done once when you install Hadoop and not for running Hadoop filesystem, else it will delete all your data from HDFS

b. Start HDFS Services

sbin/start-dfs.sh

It will give an error at the time of start HDFS services then use:

echo "ssh" | sudo tee /etc/pdsh/rcmd_default

c. Start YARN Services

sbin/start-yarn.sh

d. Check how many daemons are running

Let us now see whether expected Hadoop processes are running or not:

jps
2961 ResourceManager
2482 DataNode
3077 NodeManager
2366 NameNode
2686 SecondaryNameNode
3199 Jps

Learn How to install Cloudera Hadoop CDH5 on ubuntu from this installation guide.

2.6. How to Stop the Hadoop services

Let us learn how to stop Hadoop services now:

a. Stop YARN services

sbin/stop-yarn.sh

b. Stop HDFS services

sbin/stop-dfs.sh

Note:

Browse the web interface for the NameNode; by default, it is available at:

NameNode – http://localhost:9870/

Browse the web interface for the ResourceManager; by default, it is available at:

ResourceManager – http://localhost:8088/

Run a MapReduce job

We are all ready to start our first Hadoop MapReduce job through Hadoop word count example.

Learn MapReduce job optimization and performance tuning techniques.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Installation of Hadoop 3.x on Ubuntu on Single Node Cluster