How to Install Hadoop 1.x on multi-node cluster?
1. Objective
This tutorial describes how to setup and configure a multi-node cluster of Hadoop version-1.x. This tutorial will guide you step by step to install Hadoop 1.x on a multi-node cluster. Once the installation is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.
2. Hadoop Installation Video Tutorial
3. Install Hadoop 1.x on Multi-node Cluster
Follow the steps given below to install Hadoop 1.x on multi-node cluster-
3.1. Recommended Platform
- OS: Ubuntu 14.04 or later (you can use other OS (CentOS, Redhat, etc))
- Hadoop: Cloudera distribution for Apache Hadoop CDH3U6 (you can use Apache Hadoop 1.X)
I. Setup Platform
If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.
New to Linux. follow this Linux Commands Guide and get hands-on knowledge of Linux commands
3.2. Prerequisites
- Java (Oracle java is recommended for production)
- Password-less SSH setup (Hadoop need passwordless ssh from master to all the slaves, this is required for remote script invocations)
Run following commands on the Master of Hadoop Cluster–
3.3. Install Java 8 (recommended oracle java)
I. Update the source list
[php]sudo apt-get update
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
[/php]
II. Install Java
[php]sudo apt-get install oracle-java8-installer[/php]
3.4. Add entry of master and slaves in hosts file
Edit hosts file and following add entries
[php]sudo nano /etc/hosts
MASTER-IP master
SLAVE01-IP slave-01
SLAVE02-IP slave-02[/php]
(In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of the corresponding IP)
3.5. Configure SSH
Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!
I. Install Open SSH Server-Client
[php]sudo apt-get install openssh-server openssh-client[/php]
II. Generate key pairs
[php]ssh-keygen -t rsa -P “”[/php]
III. Configure passwordless SSH
[php]Copy the contents of “$HOME/.ssh/id_rsa.pub” of master to “$HOME/.ssh/authorized_keys”all the slaves.[/php]
IV. Check by SSH to slaves[php]ssh slave-01
ssh slave-02[/php]
3.6. Download Hadoop
http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz
3.7. Install Hadoop
I. Untar Tarball
[php]tar xzf hadoop-0.20.2-cdh3u6.tar.gz[/php]
II. Go to HADOOP_HOME_DIR
[php]cd hadoop-0.20.2-cdh3u6/[/php]
3.8. Setup Configuration
I. Edit configuration file conf/hadoop-env.sh and set JAVA_HOME
[php]export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/jdk1.8.0_65)[/php]
II. Edit configuration file conf/core-site.xml and add following entries
[php]<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop_admin/hdata/hadoop-${user.name}</value>
</property>
</configuration>[/php]
III. Edit configuration file conf/hdfs-site.xml and add following entries
[php]<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>[/php]
IV. Edit configuration file conf/mapred-site.xml and add following entries
[php]<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>[/php]
V. Edit configuration file conf/masters and add entry of secondary master
[php]slave-01[/php]
IP/Alias of node, where secondary master will run
VI. Edit configuration file conf/slaves and add entry of slaves
[php]slave-01
slave-02[/php]
VII. Set environment variables Update ~/.bashrc and set or update the HADOOP_HOME and PATH shell variables as follows
[php]nano ~/.bashrc
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2-cdh3u6
export PATH=$PATH:$HADOOP_HOME/bin[/php]
Hadoop is set up on master…..!!!!
3.9. Setup Hadoop on slaves
I. Repeat the step-3 and step-4 on all the slaves
[php]Step-3: “install Java”
Step-4: “Add entry of master, slaves in hosts file”[/php]
II. Create tarball of configured Hadoop-setup and copy to all the slaves
[php]tar czf hadoop.tar.gz hadoop-0.20.2-cdh3u6
scp hadoop.tar.gz slave01:~
scp hadoop.tar.gz slave02:~[/php]
III. Untar configured Hadoop-setup on all the slaves
[php]tar xzf hadoop.tar.gz[/php]
Run this command on all the slaves
3.10. Start The Cluster
I. Format the name node
[php]bin/hadoop namenode –format[/php]
This activity should be done once when you install Hadoop, else It will delete all your data from HDFS
II. Now start Hadoop services
- Start HDFS services
[php]bin/start-dfs.sh[/php]
Run this command on master
- Start Map-Reduce services
[php]bin/start-mapred.sh[/php]
Run this command on master
III. Check daemons status, by running jps command
- On master
[php]jps
NameNode
JobTracker[/php]
- On slaves-01
[php]jps
TaskTracker
DataNode
SecondaryNameNode[/php]
- On slaves-02
[php]jps
TaskTracker
DataNode[/php]
Play with HDFS, follow this tutorial to run HDFS commands and perform important operations.
3.11. Stop the cluster
I. Stop MapReduce services
[php]bin/start-mapred.sh[/php]
Run this command on master
II. Stop HDFS services
[php]bin/start-dfs.sh[/php]
Run this command on master
See Also-
- Master Big Data – Hadoop and become employable in Big Data Industry
- Top 10 Useful Hdfs Commands Part-I
- Top 10 Useful Hdfs Commands Part-II
- Hadoop Ecosystem components
Your opinion matters
Please write your valuable feedback about DataFlair on Google
Your style is so unique in comparison to other folks I’ve read stuff from.
Thank you for posting when you’ve got the opportunity, Guess I’ll just bookmark this page.
I should say this publish was great. Keep up the good work!!
Cheers!