How to Install Hadoop on Centos | Cloudera Hadoop Installation

Hadoop Quiz

1. Install Hadoop on CentOS: Objective

This is steps by steps tutorial to install Hadoop on CentOS, configure and run Hadoop cluster on CentOS. This guide explains how to step by step install Hadoop on CentOS or we can say, deploy a single node cluster on CentOS, single node Hadoop cluster setup is also called as pseudo-distributed mode installation. Once the Cloudera installation is done you can play with Hadoop and its components like MapReduce for data processing and HDFS for data storage.

Learn how to install Cloudera Hadoop CDH5 on CentOS.

Learn how to install Cloudera Hadoop CDH5 on CentOS.

2. Steps to Install Cloudera Hadoop on CentOS

Follow the steps given below to install Hadoop on CentOS-

2.1. Install Java

Go to your home directory and download the Java using the below commands:

cd ~
cd /home/saurabhpandey/Downloads
tar xzf jdk-8u101-linux-x64.tar.gz
mv /home/saurabhpandey/Downloads/jdk1.8.0_101 /home/saurabhpandey

I. Install Java with Alternatives

In order to install Java we will use alternatives command which is available in chkconfig package.

cd ~
cd jdk1.8.0_101/
sudo alternatives --install /usr/bin/java java /home/saurabhpandey/jdk1.8.0_101/bin/java 2
[sudo] password for saurabhpandey:
sudo alternatives --config java
[sudo] password for saurabhpandey:

There are 3 programs which provide ‘java’.

Selection    Command
-----------------------------------------------
*+ 1       /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.71-2.b15.el7_2.x86_64/jre/bin/java
2          /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.95-2.6.4.0.el7_2.x86_64/jre/bin/java
3          /home/saurabhpandey/jdk1.8.0_101/bin/java

Type selection number: 3, Hence Java 8 gets successfully installed on your system.

installed JDK

Install Java alternatives to Install Hadoop on CentOS

II. Configuring Environment Variables

In order to configure Java environment variables we need to use the following commands:
a. Setup JAVA_HOME Variable
export JAVA_HOME=/home/saurabhpandey/Downloads/jdk1.8.0_101
b. Setup JRE_HOME Variable
export JRE_HOME=/home/saurabhpandey/Downloads/jdk1.8.0_101/jre
c. Setup PATH Variable
export PATH=$PATH:/home/saurabhpandey/Downloads/jdk1.8.0_101/bin:/home/saurabhpandey/Downloads/jdk1.8.0_101/jre/bin

2.2. Install Hadoop on CentOS

I. Configure Password Less SSH

In order to set up key based ssh, we need to execute following commands:
ssh-keygen -t rsa

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): “Press Enter”
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):  “Press Enter”
Enter same passphrase again: “Press Enter”

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

ssh settings

Install Hadoop on CentOS to Configure Password Less SSH

In order to verify key based login, we need to execute below command. But, it should not ask for a password.
ssh localhost

II. Download Hadoop

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.5.0.tar.gz
Now copy Hadoop to Home directory of your system once it gets downloaded.

III. Untar Hadoop File

All the contents of compressed Hadoop file package can be extracted using the below command:
tar xzf hadoop-2.6.0-cdh5.5.0.tar.gz

IV. Setup Configuration Files

a. Edit .bashrc
Now makes changes in environment file “.bashrc” present in your system home directory. We can make changes in this file by executing: “$ nano -/.bashrc” command and there we need to insert following parameters at the end:

export HADOOP_HOME=/home/saurabhpandey/hadoop-2.6.0-cdh5.5.0
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

hadoop_path_bashrc

Setup configuration files to Install Hadoop on CentOS

Note: Don’t forget to take care of entering the correct path. “/home/saurabhpandey/hadoop-2.6.0-cdh5.5.0” this is the path of my home directory. You can get your path by using “$pwd” command.
After adding above parameters save this file. Press “Ctrl+X” on your keyboard.
In order to make environment variables effective we need to execute below command:
source .bashrc
b. Edit hadoop-env.sh
Now makes changes in Hadoop configuration file “hadoop-env.sh” which is present in configuration directory (HADOOP_HOME/etc/hadoop) and in that file we need to set JAVA_HOME.

cd hadoop-2.6.0-cdh5.5.0/
cd etc
cd hadoop
nano hadoop-env.sh

We need to change the path of JAVA_HOME on this file as:
export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /home/saurabhpandey/Downloads/jdk1.8.0_101/)

Hadoop-env.sh_Java_Home

Install Hadoop on CentOS – Edit hadoop-env.sh

After changing the path of JAVA_HOME save this file. Press  “Ctrl+X” on your keyboard.
Note: “/home/saurabhpandey/Downloads/jdk1.8.0_101/” is the path of Java on my system. Enter your path where your java file is present.
c. Edit core-site.xml
Now makes changes in Hadoop configuration file “core-site.xml” which is present in configuration directory (HADOOP_HOME/etc/hadoop) by executing the below command:
nano core-site.xml
And inside it add the following entries between <configuration> </configuration> present at the end of this file:

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Hadoop-Core-site_edit

Install Hadoop on CentOS – Edit core-site

After adding above parameters save this file. Press “Ctrl+X” on your keyboard.
e. Edit hdfs-site.xml
Now we need to make changes in Hadoop configuration file hdfs-site.xml (which is located in HADOOP_HOME/etc/hadoop) by executing the below command:
nano hdfs-site.xml
And inside it add the following entries between <configuration> </configuration> present at the end of this file:

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Hadoop-hdfs-site_edit

Install Hadoop on CentOS – Edit hdfs-site.xml

Here we are setting Replication factor to 1, as it is a single node cluster.
After adding above parameters save this file. Press  “Ctrl+X” on your keyboard.
f. Edit mapred-site.xml
In order to edit mapred-site.xml file we need to first create a copy of file mapred-site.xml.template.
A copy of this file can be created using the following command:
cp mapred-site.xml.template mapred-site.xml
We will edit the mapred-site.xml file by using the following command:
nano mapred-site.xml
And inside it add the following entries between <configuration> </configuration> present at the end of this file:

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

Mapred-site_edit

Install Hadoop on Cent OS – Edit mapred-site.xml

After adding above parameters save this file. Press  “Ctrl+X” on your keyboard.
g. Edit yarn-site.xml
Now we need to make changes in Hadoop configuration file yarn-site.xml (which is located in HADOOP_HOME/etc/hadoop) by executing the below command:
nano yarn-site.xml
And inside it add the following entries between <configuration> </configuration> present at the end of this file:

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

yarn-site_edit

Hadoop installation Steps – Edit yarn-site.xml

After adding above parameters save this file. Press  “Ctrl+X” on your keyboard.

V. Starting the Hadoop Cluster

a. NameNode Formatting
hdfs namenode -format

starting_hadoop

Cloudera Hadoop Installation – NameNode formatting

NOTE: After we install hadoop on CentOS, we perform this activity only once otherwise all the data will get deleted and namenode would be assigned some other namespaceId. In case namenode and datanode Id’s doesn’t matches, then the datanode daemons will not start while starting the daemons.
b. Start HDFS Daemons
start-dfs.sh
c. Start YARN Daemons
start-yarn.sh

2.3. Checking Installation

In order to check whether Hadoop is running successfully or not open the below address on any of your browsers:
http://localhost:50070

Browser

Checking Install Hadoop on CentOS

If this output appears means you have successfully installed Hadoop in CentOS. After Hadoop Installation studies this guide to learn how Hadoop works.
See Also-

Hope you This tutorial on Install Hadoop on CentOS was to you. If you have any queries on How to install Hadoop on CentOS drop a comment below and we will get back to you.

8 Responses

  1. lilly says:

    Hello There. I found your blog using msn. This is a very well written article. I have successfully installed and run Hadoop.
    I’ll be sure to bookmark it and come back to read more. Thanks for the post.

  2. scott says:

    Hello there, You’ve done an incredible job. I’ll definitely digg it and
    for my part recommend to my friends. I am sure they’ll be benefited from this website.

  3. Kiersten says:

    Hi there would you mind sharing which blog platform you’re working with?
    I’m looking to start my own blog soon but I’m having a hard time deciding between BlogEngine/Wordpress/B2evolution and Drupal.
    The reason I ask is because your design seems different then most blogs and I’m looking
    for something completely unique. P.S My apologies for getting off-topic but
    I had to ask!

  4. annettappleroth says:

    Thanks for this. You always have some great posts. I shared this on Facebook and my followers loved it.
    Keep the the good work!

  5. Andrew says:

    Installed hadoop on Redhat RHEL 7 using your article, nice presentation.
    Are these same steps valid for Ubuntu as well ?

  6. cornelius says:

    Great article. I have installed hadoop in like butter….
    Thanks a ton

  7. ElliRBarko says:

    hello!,I like your writing so much! percentage we remain in contact extra approximately your article
    I require a professional in this particular space to eliminate my problem.
    Maybe that’s you! Having a look forward to peer you.
    Thanks

  8. LeilaKMauser says:

    Hurrah! Finally I got a blog where I have the capacity to actually take helpful
    data concerning my study and knowledge.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.