Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5
1. Hadoop 2 Installation Tutorial: Objective
This Hadoop 2 Installation tutorial describes how to install and configure Hadoop cluster on a single-node on Ubuntu OS. Single Node Hadoop cluster is also called as “Hadoop Pseudo-Distributed Mode”. The Hadoop 2 installation is explained here very simply and to the point, so that you can learn Hadoop CDH5 Installation in 10 Min. Once the you install Hadoop 2 is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.
Looking to BOOST your career in the exciting field of Big Data, Learn Big Data and Hadoop from Experts.
2. Hadoop 2 Installation: Video Tutorial
This video tutorial covers Apache Hadoop 2 installation or Cloudera CDH5 installation on Ubuntu. This will help you to learn Hadoop CDH5 installation in an easy manner.
3. Install Hadoop 2 on Ubuntu
Follow the steps given below to install and configure Hadoop 2 cluster on ubuntu os-
3.1. Recommended Platform
- OS – Linux is supported as a development and production platform. You can use Ubuntu 14.04 or later (you can also use other Linux flavors like CentOS, Redhat, etc.)
- Hadoop – Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)
I. Setup Platform
If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.
3.2. Prerequisites
I. Install Java 8 (Recommended Oracle Java)
a. Install Python Software Properties
[php]sudo apt-get install python-software-properties[/php]
b. Add Repository
[php]sudo add-apt-repository ppa:webupd8team/java[/php]
c. Update the source list
[php]sudo apt-get update[/php]
d. Install Java
[php]sudo apt-get install oracle-java8-installer[/php]
II. Configure SSH
a. Install Open SSH Server-Client
[php]sudo apt-get install openssh-server openssh-client[/php]
b. Generate Key Pairs
[php]ssh-keygen -t rsa -P “”[/php]
c. Configure password-less SSH
[php]cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys[/php]
d. Check by SSH to localhost
[php]ssh localhost[/php]
3.2. Install Hadoop
I. Download Hadoop 2
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz
II. Untar Tar ball
[php]tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz[/php]
Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2).
III. Hadoop 2 Setup Configuration
a. Edit .bashrc
Now, edit .bashrc file located in user’s home directory and add following parameters:
[php]export HADOOP_PREFIX=”/home/hdadmin/hadoop-2.5.0-cdh5.3.2″
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}[/php]
Note: After above step restarts the terminal so that all the environment variables will come into effect.
b. Edit hadoop-env.sh
Now, edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:
[php]export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-oracle/)[/php]
c. Edit core-site.xml
Now, edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
[php]<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/dataflair/hdata</value>
</property>
</configuration>[/php]
Note: /home/hdadmin/hdata is a sample location; please specify a location where you have Read Write privileges
d. Edit hdfs-site.xml
Now, edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
[php]<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>[/php]
e. Edit mapred-site.xml
Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
[php]<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>[/php]
f. Edit yarn-site.xml
Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
[php]<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>[/php]
3.4. Start the Cluster
I. Format the name node
[php]bin/hdfs namenode -format[/php]
NOTE: This activity should be done once when you install Hadoop, else It will delete all your data from HDFS.
II. Start HDFS Services
[php]sbin/start-dfs.sh[/php]
III. Start YARN Services
[php]sbin/start-yarn.sh[/php]
Follow this link to learn What is YARN?
IV. Check whether services have been started
[php]jps
NameNode
DataNode
ResourceManager
NodeManager[/php]
3.5. Run Map-Reduce Jobs
I. Run word count example
[php] bin/hdfs dfs -mkdir /inputwords
bin/hdfs dfs -put <data-file> /inputwords
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.2.jar wordcount /inputwords /outputwords
bin/hdfs dfs -cat /outputwords/*[/php]
Follow HDFS command Guide to Play with HDFS Commands and perform various operations,
3.6. Stop The Cluster
I. Stop HDFS Services
[php]sbin/stop-dfs.sh[/php]
II. Stop YARN Services
[php]sbin/stop-yarn.sh[/php]
I hope now you are clear with how you do hadoop 2 installaion.
See Also-
- Carve Your Career with Big Data, Become Hadoop Administrator
- Install & Configure Apache Hadoop 2.7.x on Ubuntu
- Comparison between Hadoop vs Apache Spark vs Apache Flink
As you install Hadoop 2, you can play with HDFS. For any query about Hadoop 2 Installation feel free to share with us. we will be happy to solve your queries.
Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google
I successfully installed Hadoop CDH5 on Ubuntu with your blog. Thanks for the help!!
Can you help me
I have tried the same and after starting(start-dfs.sh) i am just able to see 3 services running.
NameNode
DataNode
SecondaryNameNode
I am not able to see the below services:
ResourceManager
NodeManager
Please help as i am new to hadoop and Linux system too.
Thanks.
tar (child): hadoop-2.5.0-cdh5.3.2.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Great post with lots of efforts.
Just wanted to know if this method can be used in Ubuntu 17.