How to Install Hadoop 1.x on multi-node cluster?

1. Objective

This tutorial describes how to setup and configure a multi-node cluster of Hadoop version-1.x. This tutorial will guide you step by step to install Hadoop 1.x on a multi-node cluster. Once the installation is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.

How to Install Hadoop 1.x on multi-node cluster?

How to Install Hadoop 1.x on multi-node cluster?

Hadoop Quiz

2. Hadoop Installation Video Tutorial

Get the most demanding skills of IT Industry - Learn Hadoop

3. Install Hadoop 1.x on Multi-node Cluster

Follow the steps given below to install Hadoop 1.x on multi-node cluster-

3.1. Recommended Platform

  • OS: Ubuntu 14.04 or later (you can use other OS (CentOS, Redhat, etc))
  • Hadoop: Cloudera distribution for Apache Hadoop CDH3U6 (you can use Apache Hadoop 1.X)

I. Setup Platform
If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.
New to Linux. follow this Linux Commands Guide and get hands-on knowledge of Linux commands

3.2. Prerequisites

  • Java (Oracle java is recommended for production)
  • Password-less SSH setup (Hadoop need passwordless ssh from master to all the slaves, this is required for remote script invocations)

Run following commands on the Master of Hadoop Cluster

3.3. Install Java 8 (recommended oracle java)

I. Update the source list

sudo apt-get update
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update

II. Install Java
sudo apt-get install oracle-java8-installer

3.4. Add entry of master and slaves in hosts file

Edit hosts file and following add entries

sudo nano /etc/hosts
MASTER-IP master
SLAVE01-IP slave-01
SLAVE02-IP slave-02

(In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of the corresponding IP)

3.5. Configure SSH

I. Install Open SSH Server-Client
sudo apt-get install openssh-server openssh-client

II. Generate key pairs
ssh-keygen -t rsa -P ""

III. Configure passwordless SSH
Copy the contents of “$HOME/.ssh/id_rsa.pub” of master to “$HOME/.ssh/authorized_keys”all the slaves.

IV. Check by SSH to slaves

ssh slave-01
ssh slave-02

3.6. Download Hadoop

http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz

3.7. Install Hadoop

I. Untar Tarball
tar xzf hadoop-0.20.2-cdh3u6.tar.gz

II. Go to HADOOP_HOME_DIR
cd hadoop-0.20.2-cdh3u6/

3.8. Setup Configuration

I. Edit configuration file conf/hadoop-env.sh and set JAVA_HOME
export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/jdk1.8.0_65)

II. Edit configuration file conf/core-site.xml and add following entries

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop_admin/hdata/hadoop-${user.name}</value>
</property>
</configuration>

III. Edit configuration file conf/hdfs-site.xml and add following entries

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

IV. Edit configuration file conf/mapred-site.xml and add following entries

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>

V. Edit configuration file conf/masters and add entry of secondary master
slave-01
IP/Alias of node, where secondary master will run

VI. Edit configuration file conf/slaves and add entry of slaves

slave-01
slave-02

VII. Set environment variables Update ~/.bashrc and set or update the HADOOP_HOME and PATH shell variables as follows

nano ~/.bashrc
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2-cdh3u6
export PATH=$PATH:$HADOOP_HOME/bin

Hadoop is set up on master…..!!!!

3.9. Setup Hadoop on slaves

I. Repeat the step-3 and step-4 on all the slaves

Step-3: “install Java”
Step-4: “Add entry of master, slaves in hosts file”

II. Create tarball of configured Hadoop-setup and copy to all the slaves

tar czf hadoop.tar.gz hadoop-0.20.2-cdh3u6
scp hadoop.tar.gz slave01:~
scp hadoop.tar.gz slave02:~

III. Untar configured Hadoop-setup on all the slaves
tar xzf hadoop.tar.gz
Run this command on all the slaves

3.10. Start The Cluster

I. Format the name node
bin/hadoop namenode –format

This activity should be done once when you install Hadoop, else It will delete all your data from HDFS

II. Now start Hadoop services

  • Start HDFS services

bin/start-dfs.sh
Run this command on master

  • Start Map-Reduce services

bin/start-mapred.sh
Run this command on master
III. Check daemons status, by running jps command

  • On master

jps
NameNode
JobTracker

  • On slaves-01

jps
TaskTracker
DataNode
SecondaryNameNode

  • On slaves-02

jps
TaskTracker
DataNode

Play with HDFS, follow this tutorial to run HDFS commands and perform important operations.

3.11. Stop the cluster

I. Stop MapReduce services

bin/start-mapred.sh

Run this command on master

II. Stop HDFS services
bin/start-dfs.sh

Run this command on master

See Also-

2 Responses

  1. Brenton says:

    Your style is so unique in comparison to other folks I’ve read stuff from.
    Thank you for posting when you’ve got the opportunity, Guess I’ll just bookmark this page.

  2. Juana says:

    I should say this publish was great. Keep up the good work!!
    Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.