Site icon DataFlair

How to Install Hadoop 1.x on multi-node cluster?

1. Objective

This tutorial describes how to setup and configure a multi-node cluster of Hadoop version-1.x. This tutorial will guide you step by step to install Hadoop 1.x on a multi-node cluster. Once the installation is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.

How to Install Hadoop 1.x on multi-node cluster?

2. Hadoop Installation Video Tutorial

3. Install Hadoop 1.x on Multi-node Cluster

Follow the steps given below to install Hadoop 1.x on multi-node cluster-

3.1. Recommended Platform

I. Setup Platform
If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.
New to Linux. follow this Linux Commands Guide and get hands-on knowledge of Linux commands

3.2. Prerequisites

Run following commands on the Master of Hadoop Cluster

3.3. Install Java 8 (recommended oracle java)

I. Update the source list
[php]sudo apt-get update
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
[/php]
II. Install Java
[php]sudo apt-get install oracle-java8-installer[/php]

3.4. Add entry of master and slaves in hosts file

Edit hosts file and following add entries

[php]sudo nano /etc/hosts
MASTER-IP master
SLAVE01-IP slave-01
SLAVE02-IP slave-02[/php]

(In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of the corresponding IP)

3.5. Configure SSH

I. Install Open SSH Server-Client
[php]sudo apt-get install openssh-server openssh-client[/php]

II. Generate key pairs
[php]ssh-keygen -t rsa -P “”[/php]

III. Configure passwordless SSH
[php]Copy the contents of “$HOME/.ssh/id_rsa.pub” of master to “$HOME/.ssh/authorized_keys”all the slaves.[/php]

IV. Check by SSH to slaves[php]ssh slave-01
ssh slave-02[/php]

3.6. Download Hadoop

http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz

3.7. Install Hadoop

I. Untar Tarball
[php]tar xzf hadoop-0.20.2-cdh3u6.tar.gz[/php]

II. Go to HADOOP_HOME_DIR
[php]cd hadoop-0.20.2-cdh3u6/[/php]

3.8. Setup Configuration

I. Edit configuration file conf/hadoop-env.sh and set JAVA_HOME
[php]export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/jdk1.8.0_65)[/php]

II. Edit configuration file conf/core-site.xml and add following entries
[php]<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop_admin/hdata/hadoop-${user.name}</value>
</property>
</configuration>[/php]

III. Edit configuration file conf/hdfs-site.xml and add following entries
[php]<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>[/php]

IV. Edit configuration file conf/mapred-site.xml and add following entries
[php]<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>[/php]

V. Edit configuration file conf/masters and add entry of secondary master
[php]slave-01[/php]
IP/Alias of node, where secondary master will run

VI. Edit configuration file conf/slaves and add entry of slaves
[php]slave-01
slave-02[/php]

VII. Set environment variables Update ~/.bashrc and set or update the HADOOP_HOME and PATH shell variables as follows
[php]nano ~/.bashrc
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2-cdh3u6
export PATH=$PATH:$HADOOP_HOME/bin[/php]

Hadoop is set up on master…..!!!!

3.9. Setup Hadoop on slaves

I. Repeat the step-3 and step-4 on all the slaves
[php]Step-3: “install Java”
Step-4: “Add entry of master, slaves in hosts file”[/php]

II. Create tarball of configured Hadoop-setup and copy to all the slaves
[php]tar czf hadoop.tar.gz hadoop-0.20.2-cdh3u6
scp hadoop.tar.gz slave01:~
scp hadoop.tar.gz slave02:~[/php]

III. Untar configured Hadoop-setup on all the slaves
[php]tar xzf hadoop.tar.gz[/php]
Run this command on all the slaves

3.10. Start The Cluster

I. Format the name node
[php]bin/hadoop namenode –format[/php]

This activity should be done once when you install Hadoop, else It will delete all your data from HDFS

II. Now start Hadoop services

[php]bin/start-dfs.sh[/php]
Run this command on master

[php]bin/start-mapred.sh[/php]
Run this command on master
III. Check daemons status, by running jps command

[php]jps
NameNode
JobTracker[/php]

[php]jps
TaskTracker
DataNode
SecondaryNameNode[/php]

[php]jps
TaskTracker
DataNode[/php]
Play with HDFS, follow this tutorial to run HDFS commands and perform important operations.

3.11. Stop the cluster

I. Stop MapReduce services

[php]bin/start-mapred.sh[/php]

Run this command on master

II. Stop HDFS services
[php]bin/start-dfs.sh[/php]

Run this command on master

See Also-

Exit mobile version