Apache Pig Installation on Ubuntu – A Pig Tutorial
1. Objective – Install Pig
This Pig tutorial briefs how to install and configure Apache Pig. Apache Pig is an abstraction over MapReduce. Pig is basically a tool to easily perform analysis of larger sets of data by representing them as data flows. To learn more about Pig follow this introductory guide. This tutorial contains steps for Apache Pig installation on Ubuntu OS.
So, let’s start Pig Installation on Ubuntu.
2. Apache Pig Installation on Ubuntu
i. Pre-Requisite to Install Pig
You must have Hadoop and Java JDK installed on your system. Hence, before installing Pig you should install Hadoop and Java by following the steps given in this installation guide.
ii. Downloading Pig
You can download Pig file from the below link:
hadoop-2.5.0-cdh5.3.2 is already installed on the system hence the supported pig version will be downloaded from here which is pig-0.12.0-cdh5.3.2.
iii. Installing Pig
The steps for Apache Pig installation are given below:
Move the downloaded pig-0.12.0-cdh5.3.2.tar file from Downloads folder to the Directory where you had installed Hadoop.
Untar pig-0.12.0-cdh5.3.2.tar file by executing the below command on your terminal:
dataflair@ubuntu:~$ tar zxvf pig-0.12.0-cdh5.3.2.tar
Now we need to configure pig. In order to configure pig, we need to edit “.bashrc” file. To edit this file execute below command:
dataflair@ubuntu:~$ nano .bashrc
And in this file we need to add the following:
export PATH=$PATH:/home/dataflair/pig-0.12.0-cdh5.3.2/bin export PIG_HOME=/home/dataflair/pig-0.12.0-cdh5.3.2 export PIG_CLASSPATH=$HADOOP_HOME/conf
After adding the above parameters save this file by using “CTRL+X” and then “Y” on your keyboard.
Update .bashrc file by executing below command:
dataflair@ubuntu:~$ source .bashrc
After refreshing the .bashrc file Pig gets successfully installed. In order to check the version of your Pig file execute the below command:
dataflair@ubuntu:~$ pig -version
If the below output appears means you had successfully configured Pig.
iv. Starting Pig
We can start Pig in one of the following two modes mentioned below:
- Local Mode
- Cluster Mode
To start using pig in local mode ‘-x local’ option is used whereas while executing only “pig” command without any options, Pig starts in the cluster mode. While running pig in local mode, it can only access files present on the local file system. Whereas, on starting pig in cluster mode pig can access files present in HDFS.
To start Pig in Local Mode execute the below command:
dataflair@ubuntu:~$ pig -x local
And if you get the below output that means Pig started successfully in Local mode.
To start Pig in Cluster-Mode execute the below command:
And if you get the below output that means Pig started successfully in Cluster mode.
So, this was all in Apache Pig Installation. Hope you like our explanation.
If these professionals can make a switch to Big Data, so can you:
So, finally, we have seen how to install Apache Pig on Ubuntu. We will learn Pig Programming in our further Pig tutorials. Feel free to ask your queries in the comment section.