Learn Apache Pig Execution: Modes and Mechanism
Keeping you updated with latest technology trends, Join DataFlair on Telegram
1. Apache Pig Execution – Objective
After installing the Apache Pig, here we will discuss the whole concept of Apache Pig Execution. Apart from its introduction, we will include Pig Execution Modes, Execution Mechanisms, and the way to Execute Apache Pig in Batch Mode in depth.
So, let’s discuss Apache Pig Execution in detail.
Let’s learn Apache Pig Installation on Ubuntu – A Pig Tutorial
2. Introduction to Apache Pig Execution
As we all know, firstly, the developer creates the scripts further, it goes to the local file system as functions. Also, when the developers submit Pig Script, it contacts with Pig Latin Compiler. Further, the compiler then splits the task and run a series of MR jobs. In that same duration, Pig Compiler retrieves data from the HDFS. Then, after running MR jobs output file again goes to the HDFS.
Let’s Explore Apache Pig Advantages and Disadvantages
If these professionals can make a switch to Big Data, so can you:
Java → Big Data Consultant, JDA
PeopleSoft → Big Data Architect, Hexaware
3. Apache Pig Execution Modes
Moreover, there are two modes in Apache Pig Execution, in which we can run Apache Pig such as, Local Mode and HDFS mode. Let’s discuss both in detail:
a. Local Mode
Basically, in this mode, all the files are installed and run on your localhost and local file system. That implies we do not need Hadoop or HDFS any more. Also, we can say we generally use this mode for testing purpose.
In other words, the pig implements on single JVM and accesses the file system, in this mode. Especially, for dealing with the small data sets, Local mode is better. In the same duration, the parallel mapper execution is impossible. However, the previous version of the Hadoop is not thread-safe.
At the same place, the user can offer –x local to get into Pig local mode of execution. Hence, Pig always looks for the local file system path while loading data.
b. MapReduce Mode
Basically, while we load or process the data that exists in the Hadoop File System (HDFS) using Apache Pig, is MapReduce mode. Also, while we execute the Pig Latin statements to process the data, a MapReduce job is invoked in the back-end to perform a particular operation on the data that exists in the HDFS, in this mode.
To be more specific, in this mode, a user could have proper Hadoop cluster setup and installations. By default, Apache pig installs as in MR mode. In addition, Pig translates the queries into MapReduce jobs and runs on top of Hadoop cluster. Hence, this mode as a MapReduce runs on a distributed cluster.
4. Apache Pig Execution Mechanisms
There are three ways, in which Apache Pig scripts can be executed such as interactive mode, batch mode, and embedded mode.
a. Interactive Mode (Grunt shell)
By using the Grunt shell, we can run Apache Pig in interactive mode. By using Dump operator, we can enter the Pig Latin statements and get the output, in this shell.
b. Batch Mode (Script)
Also, by writing the Pig Latin script in a single file with the .pig extension, we can run Apache Pig in Batch mode.
c. Embedded Mode (UDF)
By using User Defined Functions in our script, Pig offers the provision of defining our own functions (User Defined Functions) in programming languages such as Java.
Read Introduction to Apache Pig Architecture
5. Invoking the Grunt Shell
By using the −x option, we can invoke the Grunt shell in the desired mode (local/MapReduce).
1. Local mode
$ ./pig –x local
2. MapReduce mode
$ ./pig -x mapreduce
These commands give us the Grunt shell prompt.
Moreover, using ‘ctrl + d’, we can exit the Grunt shell.
Also, we can execute a Pig script by directly entering the Pig Latin statements in it, after invoking the Grunt shell.
grunt> customers = LOAD 'customers.txt' USING PigStorage(',');
6. Executing Apache Pig in Batch Mode
Further, using the –x command, we can write an entire Pig Latin script in a file and execute it. Let’s assume we have a Pig script in a file named sample_script.pig.
Sample_script.pig student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',') as (id:int,name:chararray,city:chararray); Dump student;
Then, we can execute the script in the above file.
1. Local mode
$ pig -x local Sample_script.pig
$ pig -x MapReduce Sample_script.pig
So, this was all about Apache Pig Execution. Hope you like our explanation.
7. Conclusion – Pig Execution
As a result, we have seen the whole way of Execution in Apache Pig. Moreover, we discussed Apache Pig Execution Modes, Execution Mechanisms in detail. Still, if you have any doubt, feel free to ask in the comment section.