Apache Pig Installation on Ubuntu – A Pig Tutorial

Boost your career with Free Big Data Courses!!

This Pig tutorial briefs how to install and configure Apache Pig. Apache Pig is an abstraction over MapReduce.

Pig is basically a tool to easily perform analysis of larger sets of data by representing them as data flows. To learn more about Pig follow this introductory guide. This tutorial contains steps for Apache Pig installation on Ubuntu OS.

So, let’s start Pig Installation on Ubuntu.

Apache Pig Installation on Ubuntu

i. Pre-Requisite to Install Pig

You must have Hadoop and Java JDK installed on your system. Hence, before installing Pig you should install Hadoop and Java by following the steps given in this installation guide.

ii. Downloading Pig

You can download Pig file from the below link:
https://archive.cloudera.com/cdh5/cdh/5/
hadoop-2.5.0-cdh5.3.2 is already installed on the system hence the supported pig version will be downloaded from here which is pig-0.12.0-cdh5.3.2.

iii. Installing Pig

The steps for Apache Pig installation are given below:
Step 1:
Move the downloaded pig-0.12.0-cdh5.3.2.tar file from Downloads folder to the Directory where you had installed Hadoop.
Step 2:
Untar pig-0.12.0-cdh5.3.2.tar file by executing the below command on your terminal:
[php]dataflair@ubuntu:~$ tar zxvf pig-0.12.0-cdh5.3.2.tar[/php]
Step 3:
Now we need to configure pig. In order to configure pig, we need to edit “.bashrc” file. To edit this file execute below command:
[php]dataflair@ubuntu:~$ nano .bashrc[/php]
And in this file we need to add the following:
[php]export PATH=$PATH:/home/dataflair/pig-0.12.0-cdh5.3.2/bin
export PIG_HOME=/home/dataflair/pig-0.12.0-cdh5.3.2
export PIG_CLASSPATH=$HADOOP_HOME/conf[/php]

Pig Installation_Bashrc File

Apache Pig Installation_Bashrc File

After adding the above parameters save this file by using “CTRL+X” and then “Y” on your keyboard.
Step 4:
Update .bashrc file by executing below command:
[php]dataflair@ubuntu:~$ source .bashrc[/php]
After refreshing the .bashrc file Pig gets successfully installed. In order to check the version of your Pig file execute the below command:
[php]dataflair@ubuntu:~$ pig -version[/php]
If the below output appears means you had successfully configured Pig.

Pig Version

Apache Pig Version

iv. Starting Pig

We can start Pig in one of the following two modes mentioned below:

  1. Local Mode
  2. Cluster Mode

To start using pig in local mode ‘-x local’ option is used whereas while executing only “pig” command without any options, Pig starts in the cluster mode.

While running pig in local mode, it can only access files present on the local file system. Whereas, on starting pig in cluster mode pig can access files present in HDFS.

To start Pig in Local Mode execute the below command:
[php]dataflair@ubuntu:~$ pig -x local[/php]
And if you get the below output that means Pig started successfully in Local mode.

Running_Pig_Local_mode

Running_Pig_Local_mode

To start Pig in Cluster-Mode execute the below command:
[php]dataflair@ubuntu:~$ pig[/php]
And if you get the below output that means Pig started successfully in Cluster mode.

Running_Pig_Cluster_mode

Running_Pig_Cluster_mode

So, this was all in Apache Pig Installation. Hope you like our explanation.

Conclusion

So, finally, we have seen how to install Apache Pig on Ubuntu. We will learn Pig Programming in our further Pig tutorials. Feel free to ask your queries in the comment section.

Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google

follow dataflair on YouTube

3 Responses

  1. Mandila Jackson says:

    Thank you so much! It іs an fantastic tutorial.

  2. happyUser says:

    Really Helpful Tutorial!!!!
    Hurrrayyyy

  3. Mulualem says:

    Thank you ! .It is well structured and presented in a clear way .keep in touch !

Leave a Reply

Your email address will not be published. Required fields are marked *