How to Install Hadoop 1.x on multi-node cluster?

DataFlair Team

8 years ago

1. Objective

This tutorial describes how to setup and configure a multi-node cluster of Hadoop version-1.x. This tutorial will guide you step by step to install Hadoop 1.x on a multi-node cluster. Once the installation is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.

How to Install Hadoop 1.x on multi-node cluster?

2. Hadoop Installation Video Tutorial

3. Install Hadoop 1.x on Multi-node Cluster

Follow the steps given below to install Hadoop 1.x on multi-node cluster-

3.1. Recommended Platform

OS: Ubuntu 14.04 or later (you can use other OS (CentOS, Redhat, etc))
Hadoop: Cloudera distribution for Apache Hadoop CDH3U6 (you can use Apache Hadoop 1.X)

I. Setup Platform
If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.
New to Linux. follow this Linux Commands Guide and get hands-on knowledge of Linux commands

3.2. Prerequisites

Java (Oracle java is recommended for production)
Password-less SSH setup (Hadoop need passwordless ssh from master to all the slaves, this is required for remote script invocations)

Run following commands on the Master of Hadoop Cluster–

3.3. Install Java 8 (recommended oracle java)

I. Update the source list
[php]sudo apt-get update
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
[/php]
II. Install Java
[php]sudo apt-get install oracle-java8-installer[/php]

3.4. Add entry of master and slaves in hosts file

Edit hosts file and following add entries

[php]sudo nano /etc/hosts
MASTER-IP master
SLAVE01-IP slave-01
SLAVE02-IP slave-02[/php]

(In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of the corresponding IP)

3.5. Configure SSH

I. Install Open SSH Server-Client
[php]sudo apt-get install openssh-server openssh-client[/php]

II. Generate key pairs
[php]ssh-keygen -t rsa -P “”[/php]

III. Configure passwordless SSH
[php]Copy the contents of “$HOME/.ssh/id_rsa.pub” of master to “$HOME/.ssh/authorized_keys”all the slaves.[/php]

IV. Check by SSH to slaves[php]ssh slave-01
ssh slave-02[/php]

3.6. Download Hadoop

http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz

3.7. Install Hadoop

I. Untar Tarball
[php]tar xzf hadoop-0.20.2-cdh3u6.tar.gz[/php]

II. Go to HADOOP_HOME_DIR
[php]cd hadoop-0.20.2-cdh3u6/[/php]

3.8. Setup Configuration

I. Edit configuration file conf/hadoop-env.sh and set JAVA_HOME
[php]export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/jdk1.8.0_65)[/php]

II. Edit configuration file conf/core-site.xml and add following entries
[php]<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop_admin/hdata/hadoop-${user.name}</value>
</property>
</configuration>[/php]

III. Edit configuration file conf/hdfs-site.xml and add following entries
[php]<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>[/php]

IV. Edit configuration file conf/mapred-site.xml and add following entries
[php]<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>[/php]

V. Edit configuration file conf/masters and add entry of secondary master
[php]slave-01[/php]
IP/Alias of node, where secondary master will run

VI. Edit configuration file conf/slaves and add entry of slaves
[php]slave-01
slave-02[/php]

VII. Set environment variables Update ~/.bashrc and set or update the HADOOP_HOME and PATH shell variables as follows
[php]nano ~/.bashrc
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2-cdh3u6
export PATH=$PATH:$HADOOP_HOME/bin[/php]

Hadoop is set up on master…..!!!!

3.9. Setup Hadoop on slaves

I. Repeat the step-3 and step-4 on all the slaves
[php]Step-3: “install Java”
Step-4: “Add entry of master, slaves in hosts file”[/php]

II. Create tarball of configured Hadoop-setup and copy to all the slaves
[php]tar czf hadoop.tar.gz hadoop-0.20.2-cdh3u6
scp hadoop.tar.gz slave01:~
scp hadoop.tar.gz slave02:~[/php]

III. Untar configured Hadoop-setup on all the slaves
[php]tar xzf hadoop.tar.gz[/php]
Run this command on all the slaves

3.10. Start The Cluster

I. Format the name node
[php]bin/hadoop namenode –format[/php]

This activity should be done once when you install Hadoop, else It will delete all your data from HDFS

II. Now start Hadoop services

Start HDFS services

[php]bin/start-dfs.sh[/php]
Run this command on master

Start Map-Reduce services

[php]bin/start-mapred.sh[/php]
Run this command on master
III. Check daemons status, by running jps command

On master

[php]jps
NameNode
JobTracker[/php]

On slaves-01

[php]jps
TaskTracker
DataNode
SecondaryNameNode[/php]

On slaves-02

[php]jps
TaskTracker
DataNode[/php]
Play with HDFS, follow this tutorial to run HDFS commands and perform important operations.

3.11. Stop the cluster

I. Stop MapReduce services

[php]bin/start-mapred.sh[/php]

Run this command on master

II. Stop HDFS services
[php]bin/start-dfs.sh[/php]

Run this command on master

See Also-